Development of a Natural Language Processing Application for LGBTQ+ Status in Mental Health Records

Margaret Heslin, Jaya Chaturvedi, Anne Marie Bonnici Mallia, Ace Taaca, Diogo Pontes, Charvi Saraswat, Charlotte Woodhead, Katharine Rimes, David Chandran, Jyoti Sanyal, Ruimin Ma, Robert Stewart, Angus Roberts

Research output: Contribution to journalArticlepeer-review

1 Downloads (Pure)

Abstract

Background
Lesbian, gay, bisexual, transgender, queer and related community (LGBTQ+) individuals have significantly increased risk for mental health problems. However, research on inequalities in LGBTQ+ mental healthcare is limited because LGBTQ+ status is usually only contained in unstructured, free-text sections of electronic health records.

Aims
This study investigated whether natural language processing (NLP), specifically the large language model, Bi-directional Encoder Representations from Transformers (BERT), can identify LGBTQ+ status from this unstructured text in mental health records.

Method
Using electronic health records from a large mental healthcare provider in south London, UK, relevant search terms were identified and a random sample of 10 000 strings extracted. Each string contained 100 characters either side of a search term. A BERT model was trained to classify LGBTQ+ status.

Results
Among 10 000 annotations, 14% (1449) confirmed LGBTQ+ status while 86% (8551) did not. These other categories included LGBTQ+ negative status, irrelevant annotations and unclear cases. The final BERT model, tested on 2000 annotations, achieved a precision of 0.95 (95% CI 0.93–0.98), a recall of 0.93 (95% CI 0.91–0.96) and an F1 score of 0.94 (95% CI 0.92–0.97).

Conclusion
LGBTQ+ status can be determined using this NLP application with a high success rate. The NLP application produced through this work has opened up mental health records to a variety of research questions involving LGBTQ+ status, and should be explored further. Additional work should aim to extend what has been done here by developing an application that can distinguish between different LGBTQ+ groups to examine inequalities between these groups.
Original languageEnglish
JournalBJPsych Open
Early online date13 Oct 2025
Publication statusE-pub ahead of print - 13 Oct 2025

Fingerprint

Dive into the research topics of 'Development of a Natural Language Processing Application for LGBTQ+ Status in Mental Health Records'. Together they form a unique fingerprint.

Cite this