In a study of what they term "inattentional deafness," using MEG (magnetoencephalography), the researchers were able to identify in the brain both the place and point at which auditory and visual processing in effect "compete" for prominence. As has been reported more informally in several earlier posts, visual consistently trumps auditory, which accounts for the common life-ending experience of having been oblivious to the sound of screeching tires while crossing the street fixated on a smartphone screen . . . The same applies, by the way, for haptic perception as well--except in some cases where movement, touch, and auditory team up to override visual.
The classic "audio-lingual" method of language and pronunciation teaching, which made extensive use of repetition and drill, relied on a wide range of visual aids and color schemas, often with the rationale of maintaining learner attention. Even the sterile, visual isolation of the language lab's individual booth may have been especially advantageous for some--but obviously not for everybody!
What that research "points to" (pardon the visual-kinesthetic metaphor) is more systematic control of attention (or inattention) to the visual field in teaching and learning pronunciation. Computer mediated applications go to great lengths to manage attention but, ironically, forcing the learner's eyes to focus or concentrate on words and images, no matter how engaging, may, according to this research, also function to negate or at least lesson attention to the sounds and pronunciation. Hence, the intuitive response of many learners to shut their eyes when trying to capture or memorize sound. (There is, in fact, an "old" reading instruction system called the "Look up, say" method.)
The same underlying, temporary "inattention deafness" also probably applies to the use of color associated with phonemes --or even the IPA system of symbols in representing phonemes. Although such visual systems do illustrate important relationships between visual schemas and sound that help learners understand the inventory of phonemes and their connection to letters and words in general, in the actual process of anchoring and committing pronunciation to memory, they may in fact diminish the brain's ability to efficiently and effectively encode the sound and movement used to create it.
The haptic (pronunciation teaching) answer is to focus more on movement, touch and sound, integrating those modalities with visual.The conscious focus is on gesture terminating in touch, accompanied by articulating the target word, sound or phrase simultaneously with resonant voice. In many sets of procedures (what we term, protocols) learners are instructed to either close their eyes or focus intently on a point in the visual field as the sound, word or phrase to be committed to memory is spoken aloud.
The key, however, may be just how you manage those modalities, depending on your immediate objectives. If it is phonics, then connecting letters/graphemes to sounds with visual schemas makes perfect sense. If it is, on the other hand, anchoring or encoding pronunciation (and possibly recall as well), the guiding principle seems to be that sound should be best heard (and experienced somatically, in the body) . . . but (to the extent possible) not seen!
See what I mean? (You heard it here!)
Molloy, K., Griffiths, T., Chait, M., and Lavie, N. 2015. Inattentional Deafness: Visual Load Leads to Time-Specific Suppression of Auditory Evoked Responses. Journal of Neuroscience 35 (49): 16-46.