In my last blog post, I offered a look at the process HireVue takes when developing a pre-hire assessment model for each type of job. HireVue has always been committed to good science that creates a level playing field for all candidates and helps companies consider a larger and more diverse set of candidates than ever before. We created HireVue Assessments in part to reduce the issues associated with unconscious human bias in screening interviews -- and it worked. Our customers have shared stories of how they’ve increased the diversity of their hires as a result. But what about algorithmic bias?
If you judged AI/machine-learning algorithms solely by the headlines these days, you might think that all algorithms are as biased as we humans are - or worse. Without deliberately working to reduce bias that may reside in an algorithm’s training data or its data scientist creators, algorithms are absolutely at risk of inheriting the biases of humans. From the earliest days of designing HireVue Assessments, the team has been deeply committed to an ethical, rigorous, and ongoing process of testing and mitigating for the presence of bias in our HireVue Assessments models (also referred to as algorithms).
When we create a job-specific algorithm or model, a primary focus of our process is finding and eliminating factors that cause bias. For every model built, the HireVue team undergoes these steps:
This process is standard operating procedure for every model HireVue builds, one per job type.
There are a number of misconceptions about these assessment models and other AI products in the media every week. Let’s talk about some of the most common misconceptions here below.
This is simply not accurate. First of all, a HireVue Assessments model/algorithm is not a robot, but a form of AI/machine learning that has a single, specific, early-stage evaluation to perform. Its only focus is determining which subset of candidates within a given pool are most likely to be successful when compared to people already performing the job. That information is then provided to human recruiters as decision support.
Those top candidates then move on from the screening stage to the person-to-person interviewing stages. Skilled recruiting professionals continue to decide which candidate gets the job after the completion of multiple stages in the hiring process.
In the case of HireVue Assessments models, the algorithm is paying attention only to those factors of the interview that research has proven to be predictive of success in the job.
On the other hand, human interviewers are often distracted by many other factors in an interview, factors that are unrelated to job success for that particular job. In addition, humans tend to have weakly defined definitions of success in job roles, and all too frequently revert to “gut instinct” that can often be driven by unconscious bias.
Here’s an example: The model might notice that most of a company’s top technical support representatives tend to speak more slowly than the rest. It may also happen to be the case that speaking slowly is more common in men than women, and this might skew the results so that the model rates men more highly than women. If we find this during testing, we can “shut off” the feature that measures for the speed of spoken communications in order to prevent men being given higher scores than women based on this feature. We then retest the model to ensure we’ve addressed the adverse impact.
Decades of research have shown that traditional interviews are full of implicit and explicit bias, and tremendous inconsistency. The HireVue approach has been proven to be measurably more accurate at predicting performance than human evaluators and is audited, tested, retrained, and audited again to ensure that there is no adverse impact.
This is even less true with an algorithm performing the evaluation than in human interviewing, because the models have been built and trained to “notice” and evaluate only the characteristics that are significant to job success. Charisma may matter in some jobs, but most of the time it will not be important, and therefore would not be considered by the model.
As with the other misconceptions above, human interviewers are far more likely to negatively judge a candidate based on nervousness than a machine-learning model is. The truth is that most of us are nervous doing any kind of interview; in fact, most people don’t even like interviewing. Interviews, assessments, and resumes represent a variety of ways that companies get to know people. The difference here is that the HireVue Assessments model can overlook the parts of your performance that really don’t make a difference, whereas human recruiters may not be able to avoid noticing them.
These misconceptions really beg a critical question: Which should we prefer? The world where hiring is influenced by a human with an unclear definition of job success asking inconsistent questions (who may or may not be paying attention to the answers the candidate is giving) evaluating on unknown criteria, OR a data-driven method that’s fairer, consistent, auditable, improvable, and inclusive?
I know that I speak for everyone on the HireVue product development and data science team when I say that we are committed to actively working to reduce bias in the hiring process and to open up more opportunities for a wider variety of well-qualified people. Look for more blog posts on our process, our commitment to scientific standards, and our work on AI ethics in the coming months.