Skip to contentright arrow
Babylon Health

Ladies (and everyone else), let's kick some bias

Written by

, 5 min read

Ladies (and everyone else), let's kick some bias

How can we prevent gender bias in medical AI technology?

Gender bias in healthcare is a well-recognised issue. From diagnosis to drug development and treatment, the modern healthcare system has been shown to advantage men over women. The statistics on gender differences in pain management are a poignant example- Women who are in pain wait longer to be prescribed painkillers 1. Pain is more likely to be misdiagnosed as a mental health issue in females and there is a delay in diagnosing brain tumours presenting as a headache in women 2,3. Historically there has also been an exclusion of women from drug trials4. 80% of painkillers have only ever been tested on men despite the fact that 70% of sufferers of chronic pain are women5.

Responsibly designed artificial intelligence (AI) and machine learning algorithms have the potential to overcome gender bias in medicine. However, if machine learning methods are implemented without careful thought and consideration they can lead to the perpetuation and even accentuation of existing biases. We have already seen evidence of gender bias in the development of AI technology, for example, natural language processing (NLP) has been shown to perform better in male speakers than female speakers 6. With the rapid growth of medical AI technology, it is crucial that we identify the risk of gender bias and do all that we can to mitigate them.

How can we develop technology in a way that prevents rather than perpetuates bias? Here are 4 key principles that can help.

Use diverse training data sets

Women have been underrepresented in medical trials historically, and yet, they receive treatment based on the conclusions of these studies. In a telling example, when a safety trial for a new ‘female viagra pill’ was conducted in 2015 92% of the study participants were men7.

The same pattern could be perpetuated in AI models if we do not ensure that data from women are adequately represented in training sets. Women go through significant hormonal and physiological changes during their lives, including menstrual cycle changes, pregnancy and the menopause. It is important that women at all life stages are represented in data sets to make sure that we can effectively assess women of all ages.

Look out for labelling bias

A common AI technique used in healthcare is called supervised learning. This method uses ‘labelled’ data sets, where the inputs (e.g. the patient's symptoms) and the predicted labels (e.g. the disease we want the AI to detect) are already known. It is important that we think carefully about where these labels come from. Some medical labels are a ‘ground truth’ or ‘gold standard’ i.e. a laboratory or biopsy-proven diagnosis, but others are based on a doctor’s clinical judgement. If the label has come from a human decision this could be vulnerable to human cognitive biases. If we assume that these labels are accurate then we may end up incorporating the cognitive biases of doctors into the model.

Cognitive biases such as availability bias, overconfidence, and confirmation bias are well known to affect doctors’ diagnoses8. It is important that these factors are carefully considered when selecting a labelled medical data set to limit the risk of incorporating existing biases into AI models.

Test technology across different patient groups

Once we have trained a model it should be tested separately on different demographic groups, whether that be based on age, gender, race or other factors, to identify whether any subgroups are being treated differently. If a particular population group is inadequately represented in the training data set or has been labelled incorrectly, this could lead to misleading results for this group and it is therefore important that subgroups are assessed independently to identify weaknesses in the model.

Involve women in technology development

A gender-balanced team, at all seniority levels, minimises the influence of societal biases and norms on patients' health outcomes but ensuring a gender-balanced product development process. Gender disparities in the STEM workforce are well documented and are particularly notable within AI9.

Babylon isn’t and will never be exclusive – an experience just for the few – because everyone everywhere has the right to quality healthcare. We’re a democratic organisation, down to our core, so it’s natural that we apply this inclusive outlook to all we do. By continuing to empower women we aim to produce inclusive, effective and unbiased products for all women, everywhere.

Outside of the male and female gender norms, people with alternative gender identities such as non-binary or genderfluid must also be considered as part of this discussion. We are working towards making our products at Babylon universally accessible to ensure that everyone, regardless of their gender identity, feels comfortable using our products and does not experience bias.

At Babylon, we’re finding ways to mitigate some well-known issues of bias in machine learning models which can go undetected with current bias detection methods. By using our expertise in areas of machine learning, we are developing techniques to understand why models make certain decisions, to ensure that all users are treated fairly and equally.

Try Babylon today

Babylon offers high-quality, 24/7 comprehensive health care. Let us help:

Download the Babylon app

Learn more about our services


  1. [Gender disparity in analgesic treatment of emergency department patients with acute abdominal pain. Chen EH, Shofer FS, Dean AJ, Hollander JE, Baxt WG, Robey JL, Sease KL, Mills AM. Acad Emerg Med. 2008 May;15(5):414-8. doi: 10.1111/j.1553-2712.2008.00100.x. PMID: 18439195
  2. Shapiro, A.P., Teasell, R.W. Misdiagnosis of chronic pain as hysteria and malingering. Current Review of Pain 2, 19–28 (1998).
  3. Age and Gender Variations in Cancer Diagnostic Intervals in 15 Cancers: Analysis of Data from the UK Clinical Practice Research Datalink. Din NU, Ukoumunne OC, Rubin G, Hamilton W, Carter B, Stapley S, Neal RD. PLoS One. 2015 May 15;10(5):e0127717. doi: 10.1371/journal.pone.0127717. eCollection 2015. PMID: 25978414
  4. Foulkes MA (June 2011). "After inclusion, information and inference: reporting on clinical trials results after 15 years of monitoring inclusion of women".Journal of Women's Health. 20(6): 829–36. doi:10.1089/jwh.2010.2527PMID 21671773
  8. Saposnik G, Redelmeier D, Ruff CC, Tobler PN. Cognitive biases associated with medical decisions: a systematic review. BMC Med Inform Decis Mak. 2016 Nov 3;16(1):138. doi: 10.1186/s12911-016-0377-1. PMID: 27809908; PMCID: PMC5093937.

The information provided is for educational purposes only and is not intended to be a substitute for professional medical advice, diagnosis, or treatment. Seek the advice of a doctor with any questions you may have regarding a medical condition. Never delay seeking or disregard professional medical advice because of something you have read here.

Ready for better healthcare?

To unlock Babylon video appointments, download the app and register using the Babylon code provided by your health insurance. If you don't receive a Babylon code through your insurance, access our Symptom Checker and Healthcheck for free.

Download on the App StorePlayStore icon