Unexpected None Outputs in Azure AI Language Custom Text Classifier

jroset 0 Reputation points
2025-07-08T15:18:53.7033333+00:00

Dear Azure Support,

Good evening. I am reaching out regarding an issue we are experiencing with the Azure AI Language service, specifically with the Custom Text Classification (multilingual, multilabel) models.

We have deployed several classification models that work on the same input but are trained to infer slightly different target labels. Despite identical infrastructure, code, and input text, we are observing that, in some cases, certain models return None as output. This occurs even when the classification threshold is set to 0, and the input text is clear, well-formed, and highly indicative of the target class.

What is particularly interesting is that for the same input:

  • All but one of the models return predictions
  • The problematic model consistently returns None

This leads us to suspect that the issue may be related to the training data, perhaps insufficient label representation or low label quality for specific categories. However, we couldn't find any relevant guidance or explanation in the official documentation to confirm this behavior.

Could this be an issue on Azure's side (e.g. model deployment, inference logic, or service inestability)? Or is there a known limitation when it comes to multilabel/multilingual configurations?

We'd appreciate any insight or troubleshooting steps you could provide.

Thank you in advance for your support.

Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
{count} votes

1 answer

Sort by: Most helpful
  1. Prashanth Veeragoni 5,745 Reputation points Microsoft External Staff Moderator
    2025-07-09T07:15:00.8266667+00:00

    Hi jroset,

    I Understand that you're using Custom Text Classification (Multilingual, Multilabel) models on Azure AI Language. For a given input text:

    ·   Most models return valid predictions.

    ·   One specific model consistently returns None, despite identical input, code, and infrastructure.

    ·   The classification threshold is set to 0, so no class should be filtered out due to low score

    Root Causes of the None-Prediction Issue:

    1.Sparse or Imbalanced Label Distribution in Training Data

    • If certain labels have very few examples, the model may not learn meaningful patterns.
    • Even with a threshold of 0, the model may internally decide the label lacks enough confidence to be included in predictions.
    • Microsoft does not explicitly document this, but internal heuristics may prevent low-confidence, poorly represented labels from being returned.

    2.Multilingual Behavior and Language Coverage

    • In multilingual mode, the model creates embeddings per language.
    • If the failing model was trained with text heavily skewed toward one language, predictions may fail silently for underrepresented languages.
    • Azure doesn’t currently provide per-language prediction fallback or clarity in multilingual class handling.
    1. Label Pruning or Suppression
    • During model training, labels that occur infrequently or don't contribute meaningfully may be internally pruned or deprioritized.
    • This could result in None being returned even for seemingly clear inputs.
    1. Model Deployment or Endpoint Inference Behavior
    • Although rare, service issues or inconsistencies in deployment may cause failures in inference.
    • If the issue occurs only in production, this possibility should not be ruled out.

    Troubleshooting Steps

    Step1: Inspect Training Data

    ·   Use Azure Language Studio to explore training data and verify:

    o   Each label has at least 50 samples.

    o   Labels are properly balanced.

    o   Multilabel combinations appear frequently and naturally.

    Step2: Test Model with Language-Specific Inputs

    ·   Run inference using inputs from the dominant training language.

    ·   Compare results to inputs from less-represented languages to assess if the issue is multilingual-specific.

    Step3: Retrain the Failing Model

    ·   Augment the training dataset by:

    o   Adding more diverse examples per label.

    o   Ensuring proper multilabel pairings are included.

    ·   Optionally switch to monolingual training if multilingual coverage is unnecessary.

    Hope this helps, do let me know if you have any further queries.

    Thank you!

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.