Re: [sudo-discuss] We built voice modulation to mask gender in technical interviews. Here’s what happened. – interviewing.io blog

5 Jul 2016

TL;DR:
Surprisingly, the perceived gender of the voice had no effect on results.
The main difference seemed to lie in much higher numbers of women leaving
the platform after one or two poor interviews.
*"it’s not about systemic bias against women or women being bad at
computers or whatever. Rather, it’s about women being bad at dusting
themselves off after failing."*
Patrik
On Jul 5, 2016 6:36 AM, "Romy Ilano" &lt;romy(a)snowyla.com&gt; wrote:
...
  Maybe this is another reason why coed groups may
benefit Women more than
 all women groups?
http://blog.interviewing.io/we-built-voice-modulation-to-mask-gender-in-tec…
 We built voice modulation to mask gender in technical interviews. Here’s
 what happened.
 June 29th, 2016
 Posted by [image: user]Aline Lerner <https://twitter.com/alinelernerLLC> on
 .
 interviewing.io <http://www.interviewing.io> is a platform where people
 can practice technical interviewing anonymously and, in the process, find
 jobs based on their interview performance rather than their resumes. Since
 we started, we’ve amassed data from thousands of technical interviews, and
 in this blog, we routinely share some of the surprising stuff we’ve
 learned. In this post, I’ll talk about what happened when we built
 real-time voice masking to investigate the magnitude of bias against women
 in technical interviews. *In short, we made men sound like women and
 women sound like men and looked at how that affected their interview
 performance. We also looked at what happened when women did poorly in
 interviews, how drastically that differed from men’s behavior, and why that
 difference matters for the thorny issue of the gender gap in tech.*
 The setup
 When an interviewer and an interviewee match on our platform, they meet in
 a collaborative coding environment with voice, text chat, and a whiteboard
 and jump right into a technical question. Interview questions on the
 platform tend to fall into the category of what you’d encounter at a phone
 screen for a back-end software engineering role, and interviewers typically
 come from a mix of large companies like Google, Facebook, Twitch, and Yelp,
 as well as engineering-focused startups like Asana, Mattermark, and others.
 After every interview, interviewers rate interviewees on a few different
 dimensions.
 [image: Feedback form for interviewers]
<http://blog.interviewing.io/wp-content/uploads/2016/05/new-interviewer-feedback.png>
 Feedback form for interviewers
 As you can see, we ask the interviewer if they would advance their
 interviewee to the next round. We also ask about a few different aspects of
 interview performance using a 1-4 scale. On our platform, a score of 3 or
 above is generally considered good.
 Women historically haven’t performed as well as men…
 One of the big motivators to think about voice masking was the
 increasingly uncomfortable disparity in interview performance on the
 platform between men and women. At that time, we had amassed over a
 thousand interviews with enough data to do some comparisons and were
 surprised to discover that women really were doing worse. Specifically, *men
 were getting advanced to the next round 1.4 times more often than women.
 Interviewee technical score wasn’t faring that well either — men on the
 platform had an average technical score of 3 out of 4, as compared to a 2.5
 out of 4 for women*.
 Despite these numbers, it was really difficult for me to believe that
 women were just somehow worse at computers, so when some of our customers
 asked us to build voice masking to see if that would make a difference in
 the conversion rates of female candidates, we didn’t need much convincing.
 … so we built voice masking
 Since we started working on interviewing.io, in order to achieve true
 interviewee anonymity, we knew that hiding gender would be something we’d
 have to deal with eventually but put it off for a while because it wasn’t
 technically trivial to build a real-time voice modulator. Some early ideas
 included sending female users a Bane mask.
 [image: Early voice masking prototype]
 <http://blog.interviewing.io/wp-content/uploads/2016/06/bane.jpg>
 Early voice masking prototype (drawing by Marcin Kanclerz
 <https://medium.com/@noansknv>)
 When the Bane mask thing didn’t work out, we decided we ought to build
 something within the app, and if you play the videos below, you can get an
 idea of what voice masking on interviewing.io sounds like. In the first
 one, I’m talking in my normal voice.
 And in the second one, I’m modulated to sound like a man.
 Armed with the ability to hide gender during technical interviews, we were
 eager to see what the hell was going on and get some insight into why women
 were consistently underperforming.
 The experiment
 The setup for our experiment was simple. Every Tuesday evening at 7 PM
 Pacific, interviewing.io hosts what we call practice rounds. In these
 practice rounds, anyone with an account can show up, get matched with an
 interviewer, and go to town. And during a few of these rounds, *we
 decided to see what would happen to interviewees’ performance when we
 started messing with their perceived genders*.
 In the spirit of not giving away what we were doing and potentially
 compromising the experiment, we told both interviewees and interviewers
 that we were slowly rolling out our new voice masking feature and that they
 could opt in or out of helping us test it out. Most people opted in, and we
 informed interviewees that their voice might be masked during a given round
 and asked them to refrain from sharing their gender with their
 interviewers. For interviewers, we simply told them that interviewee voices
 might sound a bit processed.
 We ended up with 234 total interviews (roughly 2/3 male and 1/3 female
 interviewees), which fell into one of three categories:
    - Completely unmodulated (useful as a baseline)
    - Modulated without pitch change
    - Modulated with pitch change
 You might ask why we included the second condition, i.e. modulated
 interviews that didn’t change the interviewee’s pitch. As you probably
 noticed, if you played the videos above, the modulated one sounds fairly
 processed. The last thing we wanted was for interviewers to assume that any
 processed-sounding interviewee must summarily have been the opposite gender
 of what they sounded like. So we threw that condition in as a further
 control.
 The results
 After running the experiment, we ended up with some rather surprising
 results. *Contrary to what we expected* (and probably contrary to what
 you expected as well!), *masking gender had no effect on interview
 performance* with respect to any of the scoring criteria (would advance
 to next round, technical ability, problem solving ability). If anything, we
 started to notice some trends in the opposite direction of what we
 expected: for technical ability, it appeared that men who were modulated to
 sound like women did a bit better than unmodulated men and that women who
 were modulated to sound like men did a bit worse than unmodulated women.
 Though these trends weren’t statistically significant, I am mentioning them
 because they were unexpected and definitely something to watch for as we
 collect more data.
 On the subject of sample size, we have no delusions that this is the
 be-all and end-all of pronouncements on the subject of gender and interview
 performance. We’ll continue to monitor the data as we collect more of it,
 and it’s very possible that as we do, everything we’ve found will be
 overturned. I will say, though, that had there been any staggering gender
 bias on the platform, with a few hundred data points, we would have gotten
 some kind of result. So that, at least, was encouraging.
 So if there’s no systemic bias, why are women performing worse?
 After the experiment was over, I was left scratching my head. If the issue
 wasn’t interviewer bias, what could it be? I went back and looked at the
 seniority levels of men vs. women on the platform as well as the kind of
 work they were doing in their current jobs, and neither of those factors
 seemed to differ significantly between groups. But there was one nagging
 thing in the back of my mind. I spend a lot of my time poring over
 interview data, and I had noticed something peculiar when observing the
 behavior of female interviewees. Anecdotally, it seemed like women were
 leaving the platform a lot more often than men. So I ran the numbers.
 What I learned was pretty shocking. *As it happens, women leave
 interviewing.io <http://interviewing.io> roughly 7 times as often as men
 after they do badly in an interview.* And the numbers for two bad
 interviews aren’t much better. You can see the breakdown of attrition by
 gender below (the differences between men and women are indeed
 statistically significant with P < 0.00001).
 Also note that as much as possible, I corrected for people leaving the
 platform because they found a job (practicing interviewing isn’t that fun
 after all, so you’re probably only going to do it if you’re still looking),
 were just trying out the platform out of curiosity, or they didn’t like
 something else about their interviewing.io experience.
 A totally speculative thought experiment
 So, if these are the kinds of behaviors that happen in the interviewing.io
 microcosm, how much is applicable to the broader world of software
 engineering? Please bear with me as I wax hypothetical and try to
 extrapolate what we’ve seen here to our industry at large. And also, please
 know that what follows is very speculative, based on not that much data,
 and could be totally wrong… but you gotta start somewhere.
 If you consider the attrition data points above, you might want to do what
 any reasonable person would do in the face of an existential or moral
 quandary, i.e. fit the data to a curve. An exponential decay curve seemed
 reasonable for attrition behavior, and you can see what I came up with
 below. The x-axis is the number of what I like to call “attrition events”,
 namely things that might happen to you over the course of your computer
 science studies and subsequent career that might make you want to quit. The
 y-axis is what portion of people are left after each attrition event. The
 red curve denotes women, and the blue curve denotes men.
 See interactive graph with Desmos
 <https://www.desmos.com/calculator/tugmyjkaj6>
 Now, as I said, this is pretty speculative, but it really got me thinking
 about what these curves might mean in the broader context of women in
 computer science. How many “attrition events” does one encounter between
 primary and secondary education and entering a collegiate program in CS and
 then starting to embark on a career? So, I don’t know, let’s say there are
 8 of these events between getting into programming and looking around for a
 job. If that’s true, then we need 3 times as many women studying computer
 science than men to get to the same number in our pipelines. Note that
 that’s 3 times more than men, not 3 times more than there are now. If we
 think about how many there are now, which, depending on your source, is
 between 1/3 and a 1/4 of the number of men, *to get to pipeline parity,
 we actually have to increase the number of women studying computer science
 by an entire order of magnitude*.
 Prior art, or why maybe this isn’t so nuts after all
 Since gathering these findings and starting to talk about them a bit in
 the community, I began to realize that there was some supremely interesting
 academic work being done on gender differences around self-perception,
 confidence, and performance. Some of the work below found slightly
 different trends than we did, but it’s clear that anyone attempting to
 answer the question of the gender gap in tech would be remiss in not
 considering the effects of confidence and self-perception in addition to
 the more salient matter of bias.
 In a study investigating the effects of perceived performance to
 likelihood of subsequent engagement
<https://labs.wsu.edu/joyceehrlinger/wp-content/uploads/sites/252/2014/10/EhrlingerDunning2003.pdf>,
 Dunning (of Dunning-Kruger fame) and Ehrlinger administered a scientific
 reasoning test to male and female undergrads and then asked them how they
 did. Not surprisingly, though there was no difference in performance
 between genders, women underrated their own performance more often than
 men. Afterwards, participants were asked whether they’d like to enter a
 Science Jeopardy contest on campus in which they could win cash prizes.
 Again, women were significantly less likely to participate, with
 participation likelihood being directly correlated with self-perception
 rather than actual performance.
 In a different study, sociologists followed a number of male and female
 STEM students over the course of their college careers
<https://www.researchgate.net/publication/291016296_Persistence_Is_Cultural_Professional_Socialization_and_the_Reproduction_of_Sex_Segregation>
 via diary entries authored by the students. One prevailing trend that
 emerged immediately was the difference between how men and women handled
 the “discovery of their [place in the] pecking order of talent, an
 initiation that is typical of socialization across the professions.” For
 women, realizing that they may no longer be at the top of the class and
 that there were others who were performing better, “the experience
 [triggered] a more fundamental doubt about their abilities to master the
 technical constructs of engineering expertise [than men].”
 And of course, what survey of gender difference research would be complete
 without an allusion to the wretched annals of dating? When I told the
 interviewing.io team about the disparity in attrition between genders,
 the resounding response was along the lines of, “Well, yeah. Just think
 about dating from a man’s perspective.” Indeed, a study published in the *Archives
 of Sexual Behavior*
 <http://link.springer.com/article/10.1023/B:ASEB.0000028892.63150.be>
 confirms that men treat rejection in dating very differently than women,
 even going so far as to say that men “reported they would experience a more
 positive than negative affective response after… being sexually rejected.”
 Maybe tying coding to sex is a bit tenuous, but, as they say, programming
 is like sex — one mistake and you have to support it for the rest of your
 life.
 Why I’m not depressed by our results and why you shouldn’t be either
 Prior art aside, I would like to leave off on a high note. I mentioned
 earlier that men are doing a lot better on the platform than women, but
 here’s the startling thing. *Once you factor out interview data from both
 men and women who quit after one or two bad interviews, the disparity goes
 away entirely.* So while the attrition numbers aren’t great, I’m
 massively encouraged by the fact that at least in these findings, it’s not
 about systemic bias against women or women being bad at computers or
 whatever. Rather, it’s about women being bad at dusting themselves off
 after failing, which, despite everything, is probably a lot easier to fix.
 1Roughly 15% of our users are female. We want way more, but it’s a start.↩
 2If you want to hear more examples of voice modulation or are just
 generously down to indulge me in some shameless bragging, we got to demo it
 on NPR
<http://www.npr.org/2016/04/12/473912220/blind-hiring-while-well-meaning-may-create-unintended-consequences>
 and in Fast Company
<http://www.fastcompany.com/3059522/this-interviewing-platform-changes-your-voice-to-eliminate-unconscious-bias>
 .↩
 3In addition to asking interviewers how interviewees did, we also ask
 interviewees to rate themselves
 <http://blog.interviewing.io/wp-content/uploads/2015/12/interviewee-feedback.png>.
 After reading the Dunning and Ehrlinger study, we went back and checked to
 see what role self-perception played in attrition. In our case, the answer
 is, I’m afraid, TBD, as we’re going to need more self-ratings to say
 anything conclusive.↩
 Sent from my iPhone
 _______________________________________________
 sudo-discuss mailing list
 sudo-discuss(a)lists.sudoroom.org
 https://sudoroom.org/lists/listinfo/sudo-discuss

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [sudo-discuss] We built voice modulation to mask gender in technical interviews. Here’s what happened. – interviewing.io blog