i originally just wanted to say, it brings all the info with, doesn't need lookups, but realized i couldn't say that without explaining that this is my intuitive impression.


you see, i thought google's voice recognition software (hereafter anthropomorphically called 'google') went the long way 'round, as it were, listening, parsing, and responding attentively to everyone individually -- at least, during the all-too-brief few months that "1-800-ask-google" (or was it 411 google?) existed.


it must be noted that, for me personally, experiences with this specific service populated almost the entirety of the Venn diagram intersection of google, phone-voice-recognition, and corporate-machine learning (either theoretically or practically).  add in the couple times i tried to use google voice to take a memo, and then, well, there you have it.


it also must be noted that, historically, although able now and again to intuit grammar & syntax of a newly encountered system, i have also -- true  fact -- so flustered a phone-tree-bot (a t & t,  female variety) -- and i mean *flustered* -- that it said, in a much louder, clipped, and forceful voice: "...If you want to INTERRUPT me, PLEASE SAY 'STOP.'"


nonetheless, these are some of the hypothesis which I, knowing nothing, formulated over time, as i used the service. 


  • that whatever library it loads it has already to hand (it only needed to 'look up' actual addresses and #s,
  • that it takes it a moment to understand relationship
  • that new implications of relationship between phonemes are stored forever as precedent
  • that it learns very very fast
  • that it applies what it has learned globally, as in, not person-specific


i noticed how quickly it trained itself to not only the sound of my voice, but to my verbal style, word order and choice of words.  i tested it by having friends refer to places or ask for answers in ways specific to me, from different phones.


i decided i didn't want to speak at all one day, and answered google's questions by pressing keys - 1 for yes, 2 for no, or pressing the number that corresponded to my choice in a list.  by the second question it had completely figured out what i was doing -- on the first it stumbled over the multiple choice, only natively 'understanding' the new format for yes/no.


and get this: less than a week later it was offering that format among the response options it listed for users.


all of which may  not have helped at all, i realize.


and perhaps neither will this page, which i thought was pretty interesting, if over my head.


ah well, have fun!





On Sun, Apr 12, 2015 at 12:19 AM, Adam Munich <adam@aperture.systems> wrote:

I'm trying to reverse engineer the "OK google" functionality implemented in my phone. 


0.png


What do you suppose I do with those feature / data sets? Since "OK google" responds to my voice independently of the rate of speech, methinks they are using a combination of regression analysis and discrete time warping. 


But, it's seemingly both speaker and pitch independent too, so there must be something else going on. There's no way they implemented a full Hidden Markov Model inside the phone's DSP, (it wouldn't make sense for just one hotword).


Thoughts?




---

Aperture Systems: Redefining Radiography -  http://aperture.systems/

http://adammunich.com/ - Cell: +1-650-452-0554


Be • knowledgeable •  social • patient • fearless • compassionate • fun • humble • forgiving. 


Be a leader




_______________________________________________

sudo-discuss mailing list

sudo-discuss@lists.sudoroom.org

https://lists.sudoroom.org/listinfo/sudo-discuss





-- 

Be seeing you.