[sudo-discuss] We built voice modulation to mask gender in technical interviews. Here’s what happened. – interviewing.io blog

Sun Jul 24 08:34:38 PDT 2016

Thanks for sharing this.

----------
Jason Miller

On Fri, Jul 15, 2016 at 9:50 PM, Candace Lazarou <candacelazarou at gmail.com>
wrote:

> I 100% believe that is a determining factor, based on my socialization,
> and the socialization of women and girls i've known throughout my life. I
> happen to be one of the lucky few that had a contrary nature and decided i
> would adopt the lessons being taught to my male peers about faking it til i
> made it, and pushing past failure.
>
> But I disagree that creating space for women or other minority
> demographics in STEM fields either exacerbates this "feminine" insecurity
> or does not help the situation (see my discussion with Romy on encouraging
> a recent bootcamp grad). Connecting with other women in programming has
> been a motivating factor for me, and i've been hearing similar reports from
> other folks in Women Who Code in the year i've been a chapter director.
>
> I think there are women like Romy and I who thrive when thrown to the
> wolves, and there are women who can benefit from an intermediary stage
> where other women are cheering them on, but we all deserve to code!
>
> I'm glad you're bringing this particular issue to the fore though, Romy,
> because it is far too often unaddressed amongst aspiring programmers, and I
> plan to shape my 2017 organizing in such a way that promotes the psychology
> required to succeed as an engineer. Genuine thanks for that.
>
> On Tue, Jul 5, 2016 at 9:09 AM Patrik D'haeseleer <patrikd at gmail.com>
> wrote:
>
>> TL;DR:
>>
>> Surprisingly, the perceived gender of the voice had no effect on results.
>> The main difference seemed to lie in much higher numbers of women leaving
>> the platform after one or two poor interviews.
>>
>> *"it’s not about systemic bias against women or women being bad at
>> computers or whatever. Rather, it’s about women being bad at dusting
>> themselves off after failing."*
>>
>> Patrik
>> On Jul 5, 2016 6:36 AM, "Romy Ilano" <romy at snowyla.com> wrote:
>>
>>> Maybe this is another reason why coed groups may benefit Women more than
>>> all women groups?
>>>
>>>
>>>
>>> http://blog.interviewing.io/we-built-voice-modulation-to-mask-gender-in-technical-interviews-heres-what-happened/
>>>
>>> We built voice modulation to mask gender in technical interviews. Here’s
>>> what happened.
>>> June 29th, 2016
>>> Posted by [image: user]Aline Lerner <https://twitter.com/alinelernerLLC> on
>>> .
>>>
>>> interviewing.io <http://www.interviewing.io> is a platform where people
>>> can practice technical interviewing anonymously and, in the process, find
>>> jobs based on their interview performance rather than their resumes. Since
>>> we started, we’ve amassed data from thousands of technical interviews, and
>>> in this blog, we routinely share some of the surprising stuff we’ve
>>> learned. In this post, I’ll talk about what happened when we built
>>> real-time voice masking to investigate the magnitude of bias against women
>>> in technical interviews. *In short, we made men sound like women and
>>> women sound like men and looked at how that affected their interview
>>> performance. We also looked at what happened when women did poorly in
>>> interviews, how drastically that differed from men’s behavior, and why that
>>> difference matters for the thorny issue of the gender gap in tech.*
>>>
>>> The setup
>>>
>>> When an interviewer and an interviewee match on our platform, they meet
>>> in a collaborative coding environment with voice, text chat, and a
>>> whiteboard and jump right into a technical question. Interview questions on
>>> the platform tend to fall into the category of what you’d encounter at a
>>> phone screen for a back-end software engineering role, and interviewers
>>> typically come from a mix of large companies like Google, Facebook, Twitch,
>>> and Yelp, as well as engineering-focused startups like Asana, Mattermark,
>>> and others.
>>>
>>> After every interview, interviewers rate interviewees on a few different
>>> dimensions.
>>> [image: Feedback form for interviewers]
>>> <http://blog.interviewing.io/wp-content/uploads/2016/05/new-interviewer-feedback.png>
>>>
>>> Feedback form for interviewers
>>>
>>> As you can see, we ask the interviewer if they would advance their
>>> interviewee to the next round. We also ask about a few different aspects of
>>> interview performance using a 1-4 scale. On our platform, a score of 3 or
>>> above is generally considered good.
>>>
>>> Women historically haven’t performed as well as men…
>>>
>>> One of the big motivators to think about voice masking was the
>>> increasingly uncomfortable disparity in interview performance on the
>>> platform between men and women. At that time, we had amassed over a
>>> thousand interviews with enough data to do some comparisons and were
>>> surprised to discover that women really were doing worse. Specifically, *men
>>> were getting advanced to the next round 1.4 times more often than women.
>>> Interviewee technical score wasn’t faring that well either — men on the
>>> platform had an average technical score of 3 out of 4, as compared to a 2.5
>>> out of 4 for women*.
>>>
>>> Despite these numbers, it was really difficult for me to believe that
>>> women were just somehow worse at computers, so when some of our customers
>>> asked us to build voice masking to see if that would make a difference in
>>> the conversion rates of female candidates, we didn’t need much convincing.
>>>
>>> … so we built voice masking
>>>
>>> Since we started working on interviewing.io, in order to achieve true
>>> interviewee anonymity, we knew that hiding gender would be something we’d
>>> have to deal with eventually but put it off for a while because it wasn’t
>>> technically trivial to build a real-time voice modulator. Some early ideas
>>> included sending female users a Bane mask.
>>> [image: Early voice masking prototype]
>>> <http://blog.interviewing.io/wp-content/uploads/2016/06/bane.jpg>
>>>
>>> Early voice masking prototype (drawing by Marcin Kanclerz
>>> <https://medium.com/@noansknv>)
>>>
>>> When the Bane mask thing didn’t work out, we decided we ought to build
>>> something within the app, and if you play the videos below, you can get an
>>> idea of what voice masking on interviewing.io sounds like. In the first
>>> one, I’m talking in my normal voice.
>>>
>>> And in the second one, I’m modulated to sound like a man.
>>>
>>> Armed with the ability to hide gender during technical interviews, we
>>> were eager to see what the hell was going on and get some insight into why
>>> women were consistently underperforming.
>>>
>>> The experiment
>>>
>>> The setup for our experiment was simple. Every Tuesday evening at 7 PM
>>> Pacific, interviewing.io hosts what we call practice rounds. In these
>>> practice rounds, anyone with an account can show up, get matched with an
>>> interviewer, and go to town. And during a few of these rounds, *we
>>> decided to see what would happen to interviewees’ performance when we
>>> started messing with their perceived genders*.
>>>
>>> In the spirit of not giving away what we were doing and potentially
>>> compromising the experiment, we told both interviewees and interviewers
>>> that we were slowly rolling out our new voice masking feature and that they
>>> could opt in or out of helping us test it out. Most people opted in, and we
>>> informed interviewees that their voice might be masked during a given round
>>> and asked them to refrain from sharing their gender with their
>>> interviewers. For interviewers, we simply told them that interviewee voices
>>> might sound a bit processed.
>>>
>>> We ended up with 234 total interviews (roughly 2/3 male and 1/3 female
>>> interviewees), which fell into one of three categories:
>>>
>>>    - Completely unmodulated (useful as a baseline)
>>>    - Modulated without pitch change
>>>    - Modulated with pitch change
>>>
>>> You might ask why we included the second condition, i.e. modulated
>>> interviews that didn’t change the interviewee’s pitch. As you probably
>>> noticed, if you played the videos above, the modulated one sounds fairly
>>> processed. The last thing we wanted was for interviewers to assume that any
>>> processed-sounding interviewee must summarily have been the opposite gender
>>> of what they sounded like. So we threw that condition in as a further
>>> control.
>>>
>>> The results
>>>
>>> After running the experiment, we ended up with some rather surprising
>>> results. *Contrary to what we expected* (and probably contrary to what
>>> you expected as well!), *masking gender had no effect on interview
>>> performance* with respect to any of the scoring criteria (would advance
>>> to next round, technical ability, problem solving ability). If anything, we
>>> started to notice some trends in the opposite direction of what we
>>> expected: for technical ability, it appeared that men who were modulated to
>>> sound like women did a bit better than unmodulated men and that women who
>>> were modulated to sound like men did a bit worse than unmodulated women.
>>> Though these trends weren’t statistically significant, I am mentioning them
>>> because they were unexpected and definitely something to watch for as we
>>> collect more data.
>>>
>>> On the subject of sample size, we have no delusions that this is the
>>> be-all and end-all of pronouncements on the subject of gender and interview
>>> performance. We’ll continue to monitor the data as we collect more of it,
>>> and it’s very possible that as we do, everything we’ve found will be
>>> overturned. I will say, though, that had there been any staggering gender
>>> bias on the platform, with a few hundred data points, we would have gotten
>>> some kind of result. So that, at least, was encouraging.
>>>
>>> So if there’s no systemic bias, why are women performing worse?
>>>
>>> After the experiment was over, I was left scratching my head. If the
>>> issue wasn’t interviewer bias, what could it be? I went back and looked at
>>> the seniority levels of men vs. women on the platform as well as the kind
>>> of work they were doing in their current jobs, and neither of those factors
>>> seemed to differ significantly between groups. But there was one nagging
>>> thing in the back of my mind. I spend a lot of my time poring over
>>> interview data, and I had noticed something peculiar when observing the
>>> behavior of female interviewees. Anecdotally, it seemed like women were
>>> leaving the platform a lot more often than men. So I ran the numbers.
>>>
>>> What I learned was pretty shocking. *As it happens, women leave
>>> interviewing.io <http://interviewing.io> roughly 7 times as often as men
>>> after they do badly in an interview.* And the numbers for two bad
>>> interviews aren’t much better. You can see the breakdown of attrition by
>>> gender below (the differences between men and women are indeed
>>> statistically significant with P < 0.00001).
>>>
>>> Also note that as much as possible, I corrected for people leaving the
>>> platform because they found a job (practicing interviewing isn’t that fun
>>> after all, so you’re probably only going to do it if you’re still looking),
>>> were just trying out the platform out of curiosity, or they didn’t like
>>> something else about their interviewing.io experience.
>>>
>>> A totally speculative thought experiment
>>>
>>> So, if these are the kinds of behaviors that happen in the
>>> interviewing.io microcosm, how much is applicable to the broader world
>>> of software engineering? Please bear with me as I wax hypothetical and try
>>> to extrapolate what we’ve seen here to our industry at large. And also,
>>> please know that what follows is very speculative, based on not that much
>>> data, and could be totally wrong… but you gotta start somewhere.
>>>
>>> If you consider the attrition data points above, you might want to do
>>> what any reasonable person would do in the face of an existential or moral
>>> quandary, i.e. fit the data to a curve. An exponential decay curve seemed
>>> reasonable for attrition behavior, and you can see what I came up with
>>> below. The x-axis is the number of what I like to call “attrition events”,
>>> namely things that might happen to you over the course of your computer
>>> science studies and subsequent career that might make you want to quit. The
>>> y-axis is what portion of people are left after each attrition event. The
>>> red curve denotes women, and the blue curve denotes men.
>>>
>>> See interactive graph with Desmos
>>> <https://www.desmos.com/calculator/tugmyjkaj6>
>>>
>>> Now, as I said, this is pretty speculative, but it really got me
>>> thinking about what these curves might mean in the broader context of women
>>> in computer science. How many “attrition events” does one encounter between
>>> primary and secondary education and entering a collegiate program in CS and
>>> then starting to embark on a career? So, I don’t know, let’s say there are
>>> 8 of these events between getting into programming and looking around for a
>>> job. If that’s true, then we need 3 times as many women studying computer
>>> science than men to get to the same number in our pipelines. Note that
>>> that’s 3 times more than men, not 3 times more than there are now. If we
>>> think about how many there are now, which, depending on your source, is
>>> between 1/3 and a 1/4 of the number of men, *to get to pipeline parity,
>>> we actually have to increase the number of women studying computer science
>>> by an entire order of magnitude*.
>>>
>>> Prior art, or why maybe this isn’t so nuts after all
>>>
>>> Since gathering these findings and starting to talk about them a bit in
>>> the community, I began to realize that there was some supremely interesting
>>> academic work being done on gender differences around self-perception,
>>> confidence, and performance. Some of the work below found slightly
>>> different trends than we did, but it’s clear that anyone attempting to
>>> answer the question of the gender gap in tech would be remiss in not
>>> considering the effects of confidence and self-perception in addition to
>>> the more salient matter of bias.
>>>
>>> In a study investigating the effects of perceived performance to
>>> likelihood of subsequent engagement
>>> <https://labs.wsu.edu/joyceehrlinger/wp-content/uploads/sites/252/2014/10/EhrlingerDunning2003.pdf>,
>>> Dunning (of Dunning-Kruger fame) and Ehrlinger administered a scientific
>>> reasoning test to male and female undergrads and then asked them how they
>>> did. Not surprisingly, though there was no difference in performance
>>> between genders, women underrated their own performance more often than
>>> men. Afterwards, participants were asked whether they’d like to enter a
>>> Science Jeopardy contest on campus in which they could win cash prizes.
>>> Again, women were significantly less likely to participate, with
>>> participation likelihood being directly correlated with self-perception
>>> rather than actual performance.
>>>
>>> In a different study, sociologists followed a number of male and female
>>> STEM students over the course of their college careers
>>> <https://www.researchgate.net/publication/291016296_Persistence_Is_Cultural_Professional_Socialization_and_the_Reproduction_of_Sex_Segregation>
>>> via diary entries authored by the students. One prevailing trend that
>>> emerged immediately was the difference between how men and women handled
>>> the “discovery of their [place in the] pecking order of talent, an
>>> initiation that is typical of socialization across the professions.” For
>>> women, realizing that they may no longer be at the top of the class and
>>> that there were others who were performing better, “the experience
>>> [triggered] a more fundamental doubt about their abilities to master the
>>> technical constructs of engineering expertise [than men].”
>>>
>>> And of course, what survey of gender difference research would be
>>> complete without an allusion to the wretched annals of dating? When I told
>>> the interviewing.io team about the disparity in attrition between
>>> genders, the resounding response was along the lines of, “Well, yeah. Just
>>> think about dating from a man’s perspective.” Indeed, a study published
>>> in the *Archives of Sexual Behavior*
>>> <http://link.springer.com/article/10.1023/B:ASEB.0000028892.63150.be>
>>> confirms that men treat rejection in dating very differently than women,
>>> even going so far as to say that men “reported they would experience a more
>>> positive than negative affective response after… being sexually rejected.”
>>>
>>> Maybe tying coding to sex is a bit tenuous, but, as they say,
>>> programming is like sex — one mistake and you have to support it for the
>>> rest of your life.
>>>
>>> Why I’m not depressed by our results and why you shouldn’t be either
>>>
>>> Prior art aside, I would like to leave off on a high note. I mentioned
>>> earlier that men are doing a lot better on the platform than women, but
>>> here’s the startling thing. *Once you factor out interview data from
>>> both men and women who quit after one or two bad interviews, the disparity
>>> goes away entirely.* So while the attrition numbers aren’t great, I’m
>>> massively encouraged by the fact that at least in these findings, it’s not
>>> about systemic bias against women or women being bad at computers or
>>> whatever. Rather, it’s about women being bad at dusting themselves off
>>> after failing, which, despite everything, is probably a lot easier to fix.
>>> 1Roughly 15% of our users are female. We want way more, but it’s a
>>> start.↩
>>>
>>> 2If you want to hear more examples of voice modulation or are just
>>> generously down to indulge me in some shameless bragging, we got to demo it
>>> on NPR
>>> <http://www.npr.org/2016/04/12/473912220/blind-hiring-while-well-meaning-may-create-unintended-consequences>
>>> and in Fast Company
>>> <http://www.fastcompany.com/3059522/this-interviewing-platform-changes-your-voice-to-eliminate-unconscious-bias>
>>> .↩
>>>
>>> 3In addition to asking interviewers how interviewees did, we also ask
>>> interviewees to rate themselves
>>> <http://blog.interviewing.io/wp-content/uploads/2015/12/interviewee-feedback.png>.
>>> After reading the Dunning and Ehrlinger study, we went back and checked to
>>> see what role self-perception played in attrition. In our case, the answer
>>> is, I’m afraid, TBD, as we’re going to need more self-ratings to say
>>> anything conclusive.↩
>>>
>>>
>>> Sent from my iPhone
>>>
>>> _______________________________________________
>>> sudo-discuss mailing list
>>> sudo-discuss at lists.sudoroom.org
>>> https://sudoroom.org/lists/listinfo/sudo-discuss
>>>
>>> _______________________________________________
>> sudo-discuss mailing list
>> sudo-discuss at lists.sudoroom.org
>> https://sudoroom.org/lists/listinfo/sudo-discuss
>>
>
> _______________________________________________
> sudo-discuss mailing list
> sudo-discuss at lists.sudoroom.org
> https://sudoroom.org/lists/listinfo/sudo-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sudoroom.org/pipermail/sudo-discuss/attachments/20160724/9da5d516/attachment.html>