Hi jroset,
I Understand that you're using Custom Text Classification (Multilingual, Multilabel) models on Azure AI Language. For a given input text:
· Most models return valid predictions.
· One specific model consistently returns None, despite identical input, code, and infrastructure.
· The classification threshold is set to 0, so no class should be filtered out due to low score
Root Causes of the None-Prediction Issue:
1.Sparse or Imbalanced Label Distribution in Training Data
- If certain labels have very few examples, the model may not learn meaningful patterns.
- Even with a threshold of 0, the model may internally decide the label lacks enough confidence to be included in predictions.
- Microsoft does not explicitly document this, but internal heuristics may prevent low-confidence, poorly represented labels from being returned.
2.Multilingual Behavior and Language Coverage
- In multilingual mode, the model creates embeddings per language.
- If the failing model was trained with text heavily skewed toward one language, predictions may fail silently for underrepresented languages.
- Azure doesn’t currently provide per-language prediction fallback or clarity in multilingual class handling.
- Label Pruning or Suppression
- During model training, labels that occur infrequently or don't contribute meaningfully may be internally pruned or deprioritized.
- This could result in None being returned even for seemingly clear inputs.
- Model Deployment or Endpoint Inference Behavior
- Although rare, service issues or inconsistencies in deployment may cause failures in inference.
- If the issue occurs only in production, this possibility should not be ruled out.
Troubleshooting Steps
Step1: Inspect Training Data
· Use Azure Language Studio to explore training data and verify:
o Each label has at least 50 samples.
o Labels are properly balanced.
o Multilabel combinations appear frequently and naturally.
Step2: Test Model with Language-Specific Inputs
· Run inference using inputs from the dominant training language.
· Compare results to inputs from less-represented languages to assess if the issue is multilingual-specific.
Step3: Retrain the Failing Model
· Augment the training dataset by:
o Adding more diverse examples per label.
o Ensuring proper multilabel pairings are included.
· Optionally switch to monolingual training if multilingual coverage is unnecessary.
Hope this helps, do let me know if you have any further queries.
Thank you!