i originally just wanted to say, it brings all the info with, doesn't need lookups, but realized i couldn't say that without explaining that this is my intuitive impression.

you see, i thought google's voice recognition software (hereafter anthropomorphically called 'google') went the long way 'round, as it were, listening, parsing, and responding attentively to everyone individually -- at least, during the all-too-brief few months that "1-800-ask-google" (or was it 411 google?) existed.

it must be noted that, for me personally, experiences with this specific service populated almost the entirety of the Venn diagram intersection of google, phone-voice-recognition, and corporate-machine learning (either theoretically or practically). add in the couple times i tried to use google voice to take a memo, and then, well, there you have it.

it also must be noted that, historically, although able now and again to intuit grammar & syntax of a newly encountered system, i have also -- true fact -- so flustered a phone-tree-bot (a t & t, female variety) -- and i mean *flustered* -- that it said, in a much louder, clipped, and forceful voice: "...If you want to INTERRUPT me, PLEASE SAY 'STOP.'"

nonetheless, these are some of the hypothesis which I, knowing nothing, formulated over time, as i used the service.

that whatever library it loads it has already to hand (it only needed to 'look up' actual addresses and #s,
that it takes it a moment to understand relationship
that new implications of relationship between phonemes are stored forever as precedent
that it learns very very fast
that it applies what it has learned globally, as in, not person-specific

i noticed how quickly it trained itself to not only the sound of my voice, but to my verbal style, word order and choice of words. i tested it by having friends refer to places or ask for answers in ways specific to me, from different phones.

i decided i didn't want to speak at all one day, and answered google's questions by pressing keys - 1 for yes, 2 for no, or pressing the number that corresponded to my choice in a list. by the second question it had completely figured out what i was doing -- on the first it stumbled over the multiple choice, only natively 'understanding' the new format for yes/no.

and get this: less than a week later it was offering that format among the response options it listed for users.

all of which may not have helped at all, i realize.

and perhaps neither will this page, which i thought was pretty interesting, if over my head.

ah well, have fun!

On Sun, Apr 12, 2015 at 12:19 AM, Adam Munich <adam@aperture.systems> wrote:

I'm trying to reverse engineer the "OK google" functionality implemented in my phone.

What do you suppose I do with those feature / data sets? Since "OK google" responds to my voice independently of the rate of speech, methinks they are using a combination of regression analysis and discrete time warping.

But, it's seemingly both speaker and pitch independent too, so there must be something else going on. There's no way they implemented a full Hidden Markov Model inside the phone's DSP, (it wouldn't make sense for just one hotword).

Thoughts?

---

Aperture Systems: Redefining Radiography - http://aperture.systems/

http://adammunich.com/ - Cell: +1-650-452-0554

Be • knowledgeable • social • patient • fearless • compassionate • fun • humble • forgiving.

Be a leader

_______________________________________________

sudo-discuss mailing list

sudo-discuss@lists.sudoroom.org

https://lists.sudoroom.org/listinfo/sudo-discuss

Be seeing you.