[sudo-discuss] Fwd: What to do with these features?

Wed Apr 15 18:18:19 PDT 2015

 i originally just wanted to say, it brings all the info with, doesn't need
lookups, but realized i couldn't say that without explaining that this is
my intuitive *impression.*

you see, i thought google's voice recognition software (hereafter
anthropomorphically called 'google') went the long way 'round, as it were,
listening, parsing, and responding attentively to everyone individually --
at least, during the all-too-brief few months that "1-800-ask-google" (or
was it 411 google?) existed.

it must be noted that, for me personally, experiences with this specific
service populated almost the entirety of the Venn diagram intersection of
google, phone-voice-recognition, and corporate-machine learning (either
theoretically or practically).  add in the couple times i tried to use
google voice to take a memo, and then, well, there you have it.

it also must be noted that, historically, although able now and again to
intuit grammar & syntax of a newly encountered system, i have also -- true
fact -- so flustered a phone-tree-bot (a t & t,  female variety) -- and i
mean *flustered* -- that it said, in a much louder, clipped, and forceful
voice: "...If you want to INTERRUPT me, PLEASE SAY 'STOP.'"

nonetheless, these are some of the hypothesis which I, knowing nothing,
formulated over time, as i used the service.

   - that whatever library it loads it has already to hand (it only needed
   to 'look up' actual addresses and #s,
   - that it takes it a moment to understand relationship
   - that new implications of relationship between phonemes are stored
   forever as precedent
   - that it learns very very fast
   - that it applies what it has learned globally, as in, not
   person-specific

i noticed how quickly it trained itself to not only the sound of my voice,
but to my verbal style, word order and choice of words.  i tested it by
having friends refer to places or ask for answers in ways specific to me,
from different phones.

i decided i didn't want to speak at all one day, and answered google's
questions by pressing keys - 1 for yes, 2 for no, or pressing the number
that corresponded to my choice in a list.  by the second question it had
completely figured out what i was doing -- on the first it stumbled over
the multiple choice, only natively 'understanding' the new format for
yes/no.

and get this: less than a week later it was offering that format among the
response options it listed for users.

all of which may  not have helped at all, i realize.

and perhaps neither will this page, which i thought was pretty interesting,
if over my head.

ah well, have fun!

On Sun, Apr 12, 2015 at 12:19 AM, Adam Munich <adam at aperture.systems> wrote:

I'm trying to reverse engineer the "OK google" functionality implemented in
my phone.

[image: 0.png]

What do you suppose I do with those feature / data sets? Since "OK google"
responds to my voice independently of the rate of speech, methinks they are
using a combination of regression analysis and discrete time warping.

But, it's seemingly both speaker and pitch independent too, so there must
be something else going on. There's no way they implemented a full Hidden
Markov Model inside the phone's DSP, (it wouldn't make sense for just one
hotword).

Thoughts?

---

Aperture Systems: Redefining Radiography -  http://aperture.systems/

http://adammunich.com/ - Cell: +1-650-452-0554

Be • knowledgeable •  social • patient • fearless • compassionate • fun •
humble • forgiving.

Be a leader

_______________________________________________

sudo-discuss mailing list

sudo-discuss at lists.sudoroom.org

https://lists.sudoroom.org/listinfo/sudo-discuss

-- 

*Be seeing you.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sudoroom.org/pipermail/sudo-discuss/attachments/20150415/4fe93370/attachment.html>