Candidates: Are you interviewing and need support?
Like many companies around the world, HireVue has adapted to a hybrid working model, and we’ve downsized our physical office. While we were cleaning out our space, we found an old closet filled with boxes of ancient webcams (we used to ship them to candidates before front-facing cameras were on every device). Why am I talking about outdated hardware? Because this is a perfect illustration of our product journey and can be applied to the evolution of our assessments - as technology improves we implement best-practices as they become available and we relegate old tech to the literal or proverbial dust bin.
We’re excited to announce that due to recent innovations by our team, we’ve seen a significant predictive improvement in our assessment models. What does this mean for talent acquisition teams in their day-to-day operations? It means even greater prediction of job-related competencies. With these new models, we are rolling out the following changes:
Machine learning systems are particularly good at undertaking complex tasks, such as understanding language. We tested, and ultimately implemented, a new third-party vendor for speech transcription called Rev.ai. Rev.ai is absolutely best-in-class when it comes to error rates, as well as accuracy in transcribing English versus non-native English speakers with a variety of accents.
The technology recognizes spoken words based on learning from over 50,000+ hours of audio paired with human-transcribed content across a wide range of topics, industries, accents and inflections.
This particular improvement gets a bit in the weeds, but the TL;DR way to describe our natural language processing advances is that they are better able to quantify meaning and context as well as being more application-specific.
Technologies that use machine learning have recently undergone huge improvements on natural language tasks, such as summarization and discourse analysis. Our model was adapted from a RoBERTa model, which is state-of-the-art and widely used across different industries. Our team has adapted the base model to generate even greater improvements for HireVue customers. Concretely, rather than using the input language features that RoBERTa produces out of the box, we fine-tuned the neural network on job interview data to create features that are even more specific to language that matters in the interview context.
With these powerful input features from language, we optimize our assessments to produce scores that closely match interviews scored by trained human evaluators. The evaluators are measuring job-related competencies such as team orientation, adaptability, and willingness to learn on a highly standardized rubric.
Our advanced methods are more robust in understanding context, which is particularly important when the same word can have different meanings depending context. For example, the word “bank” is used in two different contexts in this sentence: “Joanne went to the river bank today, and she visited the bank to withdraw cash on the way home.” In this case, our solutions know the difference in usage.
Although we capture videos for later human review, our artificial intelligence only scores what is said by the candidate, and it does not use any visual analysis (meaning that we do not assess or try to interpret a candidate’s facial expressions, body language, emotions, or their background and surroundings). We stopped using video inputs in new models early in 2020, and going forward we will also eliminate speech inputs.
Speech inputs include things like variation in tone or pauses. Our internal research showed that these were not adding much additional predictive value to our assessments. As a result, non-language inputs will not be in new assessment models and are being removed from older models as they come up for review. We hope that this change will give candidates even greater assurance that the content of their responses is the most important element to assessing their interview.
Previously, our bias mitigation procedure was iterative, with ongoing input removal and reevaluation, but now we’ve built fairness considerations directly into the model optimization at training time. This results in a much smaller loss in predictive power as a model is mitigated. This means our models are not only incentivized to predict a job-related outcome accurately, but are simultaneously penalized if they detect any demographic group differences in assessment scores (eg. men and women have meaningfully different scores). This approach effectively obscures any information that leads to bias, preventing our models from propagating any human bias or underrepresentation that may be present in the training data.
The enhancements we’ve produced with the changes above are, quite frankly, astounding. We see on average an almost 40% improvement in our power to predict job-related competencies with this update, a magnitude we have not seen since we first started building assessments in 2014. And while the IO Psychology, Data Science, and Product teams knew that this work would yield benefits, we were collectively overjoyed at just how large the improvements are for our customers and their candidates. We will continually push what’s possible in HR technology by following the science and are always willing to embrace proven solutions.