Fast Data Science
Natural language processing consultant specialising in healthcare and pharma
17/03/2026
https://fastdatascience.com/generative-ai/openai-vs-claude-vs-qwen/
How well do the newest large language models perform? How do they compare to 2025's offerings?
We evaluated the newest large language models on a law test prepared by Eugenio Vaccari. The Chinese entrants DeepSeek and Qwen have challenged the dominance of GPT, although most UK users use GPT, Gemini, and Claude. Over time we are heading towards an 80% score on the law test when the bots are combined with a RAG system (a database of English insolvency statutes and case law), when two years ago we were only around 30%. What is fascinating is that the Chinese models are delivering a similar performance to the American juggernauts, for a fraction of the cost.
We've plotted the AI models' performance on the law exam with model release date on the x-axis, so you can see at a glance how rapidly the field is advancing. I posted an earlier version of this graph a year ago, but now we have data going back to 2024 and the early days of GPT 3.5, which is very exciting.
Meanwhile, the House of Lords has put out a report recommending that the UK government implements some protections for creative industries and to force AI companies to be transparent about where their training data came from. https://fastdatascience.com/legal-ai/ai-copyright/
π https://fastdatascience.com
09/03/2026
https://fastdatascience.com/generative-ai/ai-bubble/
Are we in an AI bubble?
AI adoption is rising but the economy hasn't yet shown signs of improved productivity across the board. How does the conjectured "AI bubble" compare to bubbles of the past such as the dot-com bubble, and what can we expect from the next few years? This blog post does not contain any AI slop.
π https://fastdatascience.com
29/12/2025
https://fastdatascience.com/ai-for-business/predict-customer-churn-machine-learning-ai/
We've just published a deep dive into how we use Machine Learning and AI to predict customer churn.
β
Why time splits (e.g., training on 2024, testing on data 2025) are the best way to prove your model can handle macroeconomic shifts and changing behaviors.
β
Model performance: How we can achieve 70-80% AUC using a Random Forest model by integrating customer account-level data and web analytics.
β
Human vs. Machine: Comparing "human-executable" scoring models with black box models.
β
The deployment gap: Why deploying a model is often more work than building itβfrom batch jobs
If youβre in Retail, SaaS, or any B2C field, understanding these patterns isn't just a "nice-to-have"βit's the difference between scaling and stagnating.
π https://fastdatascience.com
19/12/2025
https://fastdatascience.com/ai-in-research/jicl-paper/
Publication Announcement: A Generative AI-Based Legal Advice Tool for Small Businesses in Distress
We are thrilled to announce the publication of our latest paper in the Journal of International and Comparative Law (JICL):
Marton Ribary, Thomas Wood, Miklos Orban, Eugenio Vaccari, Paul Krause, A Generative AI-Based Legal Advice Tool for Small Businesses in Distress. Journal of International and Comparative Law, Vol 12.2, 2025.
Small business owners often face a "justice gap" when dealing with the complexities of corporate insolvency. To address this, our team developed the Insolvency Bot, a Retrieval-Augmented Generation (RAG) system specifically designed to provide information about English and Welsh insolvency law.
In head-to-head testing against unmodified models like GPT-4, the Insolvency Bot significantly outperformed them in legal accuracy and reliability. The system leverages a curated database of 6,000 legal texts, including statutes, case law, and HMRC forms.
What's next? We are currently working with partners around the world to develop equivalents in
* Bhutan
* India
* Eight European jurisdictions
π https://fastdatascience.com
28/11/2025
https://clinicaltrialrisk.org/clinical-trial-protocol-software/alan-turing-institute/
Thomas Wood, director of Fast Data Science, has presented the Clinical Trial Risk Tool in a brief two-minute intro at the Clinical AI Interest Group, organised by the Alan Turing Institute.
The group is a community of health professionals from a broad range of backgrounds with an interest in Clinical AI.
In the November 2025 meeting, the talk was given by Dr Jeff Hogg, Programme Director, MSc AI Implementation (Healthcare), University of Birmingham and Clinical Innovation Officer in AI, University Hospitals Birmingham NHSFT, titled *AI Readiness for Health and Care Provider Organisations*. Just before the talk, at 2:40, Thomas Wood presents the Clinical Trial Risk Tool https://clinicaltrialrisk.org/
π https://fastdatascience.com
27/10/2025
https://youtu.be/QRfeUD2Y5Is
This new video explains natural language processing: what it is, how it works, and what can it do for your organisation. Natural Language Processing (NLP) is a branch of Artificial Intelligence (AI) that focuses on giving computers the ability to understand human language, combining disciplines like linguistics, computer science, and engineering.
Companies need NLP because much of their crucial data, especially in industries like insurance, healthcare, and pharmaceuticals, exists as unstructured text in formats like PDFs, scanned documents, or audio, which computers struggle to process compared to clean numerical data.
The most effectve way a business can get value out of NLP is by implementing it as part of a wider strategic initiative, such as the development of a predictive risk model or cost model, as in our example of clinical trials. This allows the company's C-level to turn unstructured text documents into a quantifiable risk or cost estimate for the next quarter or year, delivering a phenomenal return on investment and a competitive advantage, especially in traditionally conservative industries. While lower-impact initiatives can save staffing costs (e.g., by triaging customer support), the highest impact comes from these larger strategic projects that provide predictive business insights.
π https://fastdatascience.com
17/10/2025
https://clinicaltrialrisk.org/rct-cost-modelling/clinical-trial-budget-from-charge-master/
How can you use the Clinical Trial Risk Tool to create a per-subject budget from a protocol or synopsis and a site Charge Master? This video and article will walk you through how the Clinical Trial Risk Tool by Fast Data Science can accelerate your budgeting. The Clinical Trial Risk Tool streamlines the creation of a per-subject budget by automating the typically manual process of extracting data from the Study Protocol and cross-referencing it with Charge Master/Fee Schedules.
π https://fastdatascience.com
15/09/2025
https://fastdatascience.com/ai-for-business/predict-cost-of-projects/
How can we predict the cost of a project with AI?
There are two approaches: bottom up costing (also called activity based costing), and top down or reference class forecasting. Reference class forecasting involves looking for analogous projects in the past and using them to make a sensible estimate of the cost of the future project.
We've been exploring the role AI can play in both of these contrasting forecasting techniques.
π https://fastdatascience.com
05/09/2025
https://fastdatascience.com/natural-language-processing/country-named-entity-recognition/
Liz Stanley has updated the Country Named Entity Recognition to better handle country names. Pycountry (our dependency) had to be updated because the government of Turkey has requested that the English speaking world use the name TΓΌrkiye. Thanks Liz!
Install with
pip install country-named-entity-recognition
π https://fastdatascience.com
16/07/2025
https://fastdatascience.com/drug-named-entity-recognition-python-library
Abdullah Waqar has updated the Drug Named Entity Recognition to get molecular masses from formulae and the Pubchem API. Thanks Abdullah!
Now you can retrieve arbitrary information about medical drugs from a number of sources with a simple Python installation. The tool also has a Google Sheets plugin.
Install with
pip install drug-named-entity-recognition
π https://fastdatascience.com
Click here to claim your Sponsored Listing.
Category
Telephone
Website
Address
Opening Hours
| Monday | 9am - 5pm |
| Tuesday | 9am - 5pm |
| Wednesday | 9am - 5pm |
| Thursday | 9am - 5pm |
| Friday | 9am - 5pm |