i originally just wanted to say, it brings all the info with, doesn't need
lookups, but realized i couldn't say that without explaining that this is
my intuitive *impression.*
you see, i thought google's voice recognition software (hereafter
anthropomorphically called 'google') went the long way 'round, as it were,
listening, parsing, and responding attentively to everyone individually --
at least, during the all-too-brief few months that "1-800-ask-google" (or
was it 411 google?) existed.
it must be noted that, for me personally, experiences with this specific
service populated almost the entirety of the Venn diagram intersection of
google, phone-voice-recognition, and corporate-machine learning (either
theoretically or practically). add in the couple times i tried to use
google voice to take a memo, and then, well, there you have it.
it also must be noted that, historically, although able now and again to
intuit grammar & syntax of a newly encountered system, i have also -- true
fact -- so flustered a phone-tree-bot (a t & t, female variety) -- and i
mean *flustered* -- that it said, in a much louder, clipped, and forceful
voice: "...If you want to INTERRUPT me, PLEASE SAY 'STOP.'"
nonetheless, these are some of the hypothesis which I, knowing nothing,
formulated over time, as i used the service.
- that whatever library it loads it has already to hand (it only needed
to 'look up' actual addresses and #s,
- that it takes it a moment to understand relationship
- that new implications of relationship between phonemes are stored
forever as precedent
- that it learns very very fast
- that it applies what it has learned globally, as in, not
person-specific
i noticed how quickly it trained itself to not only the sound of my voice,
but to my verbal style, word order and choice of words. i tested it by
having friends refer to places or ask for answers in ways specific to me,
from different phones.
i decided i didn't want to speak at all one day, and answered google's
questions by pressing keys - 1 for yes, 2 for no, or pressing the number
that corresponded to my choice in a list. by the second question it had
completely figured out what i was doing -- on the first it stumbled over
the multiple choice, only natively 'understanding' the new format for
yes/no.
and get this: less than a week later it was offering that format among the
response options it listed for users.
all of which may not have helped at all, i realize.
and perhaps neither will this page, which i thought was pretty interesting,
if over my head.
ah well, have fun!
On Sun, Apr 12, 2015 at 12:19 AM, Adam Munich <adam(a)aperture.systems> wrote:
I'm trying to reverse engineer the "OK google" functionality implemented in
my phone.
[image: 0.png]
What do you suppose I do with those feature / data sets? Since "OK google"
responds to my voice independently of the rate of speech, methinks they are
using a combination of regression analysis and discrete time warping.
But, it's seemingly both speaker and pitch independent too, so there must
be something else going on. There's no way they implemented a full Hidden
Markov Model inside the phone's DSP, (it wouldn't make sense for just one
hotword).
Thoughts?
---
Aperture Systems: Redefining Radiography -
http://aperture.systems/
http://adammunich.com/ - Cell: +1-650-452-0554
Be • knowledgeable • social • patient • fearless • compassionate • fun •
humble • forgiving.
Be a leader
_______________________________________________
sudo-discuss mailing list
sudo-discuss(a)lists.sudoroom.org
https://lists.sudoroom.org/listinfo/sudo-discuss
--
*Be seeing you.*