Immune-mediated inflammatory diseases (IMIDs) such as systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), psoriasis, etc. represent a significant public health problem worldwide [1,2]. These are complex conditions arising from poorly understood mechanisms that lead to dysregulated immune and inflammatory responses.
Both genetic and non-genetic factors are involved in pathogenesis. Depending on the specific disease, the abnormal inflammatory process may compromise the function of a single tissue or multiple tissues and give rise to variable clinical manifestations.
Currently available therapies are not curative and are not even effective in controlling the disease in all patients. Thus, there is a pressing need for a deeper understanding of pathogenesis, as well as for the identification of clinically useful biomarkers and the development of novel treatment strategies.
Studies carried out over the last several decades have led to the accumulation of large amounts of data related to the function of the human immune system, including in the context of IMIDs [3,4].
Datasets encompassing electronic health records, genomics (including disease-associated variants), epigenetics, transcriptomics (bulk, single-cell and spatial), proteomics, metabolomics, immune cell phenotypes and the microbiome have been or are being deposited in repositories such as ImmPort (www.immport.org), Sage Bionetworks (sagebionetworks.org), and the Human Immunome Project (www.humanimmunomeproject.org).
A new frontier in this effort is the collection of data related to the exposome, i.e., the environmental exposures (housing conditions, diet, tobacco and alcohol use, & air pollution) that influence disease development, potentially through epigenetic alterations [5,6].
Many data repositories are publicly accessible and represent very valuable resources for analyses to facilitate the understanding and treatment of IMIDs.
The extreme size and complexity of the immunology- and IMID-related datasets require the use of sophisticated computational methods to extract useful information from them. Artificial intelligence (AI)-based strategies are particularly suited to this purpose since they have the power to mine and integrate large quantities of heterogeneous data to detect patterns that are otherwise not discernible and that correlate with disease occurrence and course, variations in tissue pathology and response to treatment.
These patterns and correlations can, in turn, be used as the basis for the development of novel clinical applications. For instance, they may point to specific changes in gene expression that can be used as biomarkers predictive of disease course, treatment response, or the occurrence of treatment-associated adverse events. Such biomarkers can help in choosing the most appropriate therapy for a particular type of patient.
The patterns and correlations may also identify potential targets for new drugs or suggest strategies to prevent disease exacerbation (e.g., by avoiding a specific dietary component that correlates with worsening of symptoms). AI-based analysis of datasets related to immune function and IMIDs has already proven successful in achieving some of these desirable outcomes, including the development of novel drugs for the treatment of chronic inflammation [7].
AI-based analysis of big data does have its limitations [4]. The detection of usable patterns and correlations depends on how well the model is trained, which in turn depends on the quality of the datasets used for training.
Currently, there is significant non-uniformity in the datasets deposited in various repositories, particularly with respect to the completeness of the metadata (e.g., type, quality and processing of samples, clinical parameters, etc.), which can hamper both training and analysis.In addition, it is important to remember that individual patients may not conform to AI-generated predictions that are based on populations. This issue is particularly pertinent if the characteristics of the individual (e.g., sex, race, ethnicity) are not well represented in the training dataset.
AI models are also prone to hallucinations, a persistent problem that raises questions about the reliability of some of the results that they produce [8,9]. Finally, the patterns, correlations and predictions that result from AI-based analysis remain theoretical until they are experimentally validated.
ThinkBio’s AI-powered platform for analyzing IMID-related data is uniquely equipped to address the limitations mentioned above in several ways. It will make use of carefully curated sources of data to maximize reliability and representation.
The analytic approach will integrate data of multiple types (omics, clinical, imaging, cytometry, etc.) and will employ different AI models in order to strengthen results and interpretations. In addition, major predictions will be subjected to actual experimental validation through collaborations with investigators with the relevant laboratory or clinical resources and expertise.
Finally, as the both the data and the AI models are continuously refined as a result of technological advances and experience, many of the limitations are likely to become less significant or easier to deal with. Thus, there is every reason to be optimistic that ThinkBio will be in a strong position to use AI-driven analysis of immunological data to aid in understanding and managing IMIDs.
References
1. Nat Rev Rheumatol. 2021; 17: 515.
2. Ann Rheum Dis. 2023; 82: 351.
3. Lancet Rheumatol. 2023; doi: 10.1016/S2665-9913(23)00010-3.
4. Arch Immunol Ther Exp (Warsz). 2024; 72: doi: 10.2478/aite-2024-0006.
5. Autoimmun Rev. 2024; 23: doi: 10.1016/j.autrev.2024.103584.
6. Nat Med. 2025; 31: 1738.
7. Annu Rev Biomed Data Sci. 2025; 8: 447.
8. Ann Pediatr Endocrinol Metab. 2025; 30: 115.
9. Annu Rev Biomed Data Sci. 2025; doi: 10.1146/annurev-biodatasci-103123-095406.