Development of a Natural Language Processing Application for LGBTQ+ Status in Mental Health Records

  • Margaret Heslin
  • , Jaya Chaturvedi
  • , Anne Marie Bonnici Mallia
  • , Ace Taaca
  • , Diogo Pontes
  • , Charvi Saraswat
  • , Charlotte Woodhead
  • , Katharine Rimes
  • , David Chandran
  • , Jyoti Sanyal
  • , Ruimin Ma
  • , Robert Stewart
  • , Angus Roberts

Research output: Contribution to journalArticlepeer-review

1 Downloads (Pure)

Abstract

Background
Lesbian, gay, bisexual, transgender, queer and related community (LGBTQ+) individuals have significantly increased risk for mental health problems. However, research on inequalities in LGBTQ+ mental healthcare is limited because LGBTQ+ status is usually only contained in unstructured, free-text sections of electronic health records.

Aims
This study investigated whether natural language processing (NLP), specifically the large language model, Bi-directional Encoder Representations from Transformers (BERT), can identify LGBTQ+ status from this unstructured text in mental health records.

Method
Using electronic health records from a large mental healthcare provider in south London, UK, relevant search terms were identified and a random sample of 10 000 strings extracted. Each string contained 100 characters either side of a search term. A BERT model was trained to classify LGBTQ+ status.

Results
Among 10 000 annotations, 14% (1449) confirmed LGBTQ+ status while 86% (8551) did not. These other categories included LGBTQ+ negative status, irrelevant annotations and unclear cases. The final BERT model, tested on 2000 annotations, achieved a precision of 0.95 (95% CI 0.93–0.98), a recall of 0.93 (95% CI 0.91–0.96) and an F1 score of 0.94 (95% CI 0.92–0.97).

Conclusion
LGBTQ+ status can be determined using this NLP application with a high success rate. The NLP application produced through this work has opened up mental health records to a variety of research questions involving LGBTQ+ status, and should be explored further. Additional work should aim to extend what has been done here by developing an application that can distinguish between different LGBTQ+ groups to examine inequalities between these groups.
Original languageEnglish
JournalBJPsych Open
Early online date13 Oct 2025
Publication statusE-pub ahead of print - 13 Oct 2025

Fingerprint

Dive into the research topics of 'Development of a Natural Language Processing Application for LGBTQ+ Status in Mental Health Records'. Together they form a unique fingerprint.

Cite this