How can I use Azure OpenAI to extract structured toxicology data from reports (e.g., for Appendix E population)

Pratham Mahajan 0 Reputation points
2025-07-30T05:30:11.5166667+00:00

Hi Everyone,

I am working on a Use Case of extracting structured toxicology data from unstructured study reports, where I have to fill a standardized Excel report (namely EDGD excel report from Appendix E).

In the report we are provided with 250 parameters that we need to fill as well as 10 queries for each Parameter. I am using Azure AI search + Azure OpenAI models to do so where I can query about a Parameter and extract Information.

The models I have used were GPT 4o and GPT 4o-mini, but both are giving me poor results due to the confusion to evaluate highly scientific Data.

Can You tell me any other approach or any other model (similar price range) or any correction in my approach to improve accuracy?

If anyone has worked on a similar use case (such as HAWC, SRT, OHAT workflows, or automated Appendix E population), I would really appreciate learning from your experience.

Thanks in advance!

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Jerald Felix 4,450 Reputation points
    2025-07-30T05:44:08.8133333+00:00

    Hello Pratham Mahajan,

    Extracting structured toxicology data from complex, unstructured reports is a tough challenge—especially when you’re dealing with scientific nuance and 250+ parameters. You’re right that out-of-the-box models like GPT-4o sometimes struggle with domain-specific data, especially when it’s highly detailed or technical.

    Here are some ideas that could help improve your process:

    Fine-tune on Domain Data: If you can, create a custom fine-tuning dataset with examples of unstructured input and the exact structured output you want (even if it’s just a few hundred rows). Fine-tuned models—whether on OpenAI, Azure OpenAI, or other providers—tend to output domain-specific results with much higher accuracy.

    Consider Domain-Specific Models: You might want to explore models trained for biomedical or scientific text, such as BioGPT, SciBERT, or even open models hosted on Hugging Face. These are designed to deal with clinical and scientific language.

    Break Down the Task: Instead of having the model pull all 250 parameters at once, break extraction down into smaller, more focused prompts. For instance, extract one section or a handful of related variables at a time. You could also use a two-stage process: first, let the model find relevant sections, then have another pass that extracts the precise values.

    Use Post-Processing: Sometimes models will give you “almost right” results that can be cleaned up with rule-based scripts. Combine large language model output with regular expressions or Python code to standardize units, enforce value ranges, or match patterns known from your Excel’s structure.

    Human-in-the-Loop: For especially critical or ambiguous fields, consider flagging low-confidence results for manual review. LLMs work best when they augment, not completely replace, expert judgment—especially in regulatory or scientific environments.

    Check for HAWC, SRT, or OHAT Workflows: Investigate if these established toxicology workflows have open-source scripts, annotation sets, or guidance on automating Appendix E population. Sometimes, adapting what’s already out there saves weeks of work.

    It sounds like you’re already piecing together a solid system. A bit of domain adaptation—via fine-tuning, smarter prompt design, or hybrid automation—should boost accuracy. Good luck! Your use case is important, and every incremental enhancement to quality here can have a big impact.

    Best Regards,

    Jerald Felix

    0 comments No comments

  2. Nikhil Jha (Accenture International Limited) 230 Reputation points Microsoft External Staff Moderator
    2025-08-07T09:38:12.6966667+00:00

    Hello Pratham Mahajan,

    Thanks for sharing your use case—extracting structured toxicology data from complex scientific study reports is indeed a challenging but impactful task. GPT-4o and 4o-mini offer great latency and cost efficiency, but for highly scientific content extraction, they may not be best suited without added structure. You're on the right track by combining Azure AI Search and Azure OpenAI.

    Adding on to the suggestion by external contributors Jerald Felix (Kudos to the community here 🙌).

    Let me offer some more guidance to improve accuracy and reliability:

    1.Use a Domain-Tuned Model via Prompt Engineering

    • Prompt Engineering: Structure your prompts with few-shot examples (showing how similar input maps to expected output). This helps LLMs anchor their generation in your required format.
    • Alternatively, leverage Azure OpenAI’s Custom Prompt Flow or Azure Machine Learning pipelines to wrap domain context around your prompt reliably.

    2.Switch to GPT-4 Turbo or Use GPT-3.5 Fine-Tuned

    • If budget allows, try GPT-4 Turbo (via Azure OpenAI’s gpt-4-turbo) which has better reasoning capability.
    • Alternatively, a fine-tuned GPT-3.5 model on your toxicology report samples might yield better results than 4o-mini.

    3.Preprocess Documents Using Azure Document Intelligence

    Use Azure AI Document Intelligence to extract layout, tables, and structure from PDF reports before passing data to LLMs. This preserves context like headings, units, or table mappings which are crucial in scientific reports.

    Architecture Summary:
    PDF ➝ Azure Document Intelligence ➝ Azure AI Search ➝ Azure OpenAI (with structured prompt) ➝ Excel output mapping

    4.Hybrid Semantic Search + Extraction Pipeline

    You're already using Azure AI Search, here's how to enhance it also:

    • Index each toxicology report with semantic + vector embeddings.
    • Query the index with each of the 250 parameters.
    • Use chunked matching with vector scoring and send the most relevant chunk to GPT-4 or GPT-3.5 for structured output generation.

    Reference Links: (these might help you follow above suggestion)


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.