To achieve this I cloned the existing OpenAM HOTP module and integrated it with The Twilio service: https://www.twilio.com/
I might do a bit more of a deep dive into how exactly this works later but for now the code is up in Github:
And you can see a video of how it works below:
(I realised the quality was poor, I have fixed this please let me know if you still have issues).
There is definitely room for improvement here, at the moment I am using the a Twimlet to encode the voice, in a production deployment you will want to generate TWIML (https://www.twilio.com/docs/api/twiml) in order to have more control over the voice i.e. introduce pauses etc.