Found an interesting piece of research that at least articulates the question well by Hunger (2011). (The linked source is a pdf from something called, "ELT Journal Advanced Access, March 15, 2011.) Hunter did a small study using a system called, "Small Talk," that seemed to suggest some preliminary technology-based strategies that is worth a look sometime.
The "problem," of course, is trying to figure out whether accuracy and fluency need to be addressed at the same time, within the same class period, for example, or whether focused doses of each at different times, in different classes is sufficient. Most theorists opt for the latter in very general terms, the assumption being that from there it is the learner's job to integrate and reconcile the two.
"Haptic-integrated clinical pronunciation" (HICP) assumes, on the other hand, that it is the responsibility of the instructional program to provide practice that do both simultaneously. In part the way that is done is by having set up "kinaesthetic monitoring" of targeted sounds prior to engagement in conversational practice, somewhat analogous to that done visual/auditory in "Small talk." See this blog post and the links to several others on the use of different "channels" in HIPC work. The idea is to allow both accuracy and fluency work to be relegated to control and monitoring by "the body" in a less conscious channel that will not interfere with conscious thought any more than absolutely necessary.
There is, of course, more than one way to "skin this cat," but none more moving and touching, to be sure.