Preserving PDF formatting

Irene Hanning 20 Reputation points
2025-06-12T17:54:08.0233333+00:00

When I use a PDF for translation in the Azure Language studio, the formatting in the translated document is not preserved. (I'm translating from English to Spanish). It looks like the translator is trying to place the Spanish text in the same amount of space used for English and so it must reduce the font size. In other cases, the text runs into and over pictures on the page. I've also trained a custom translation model with my industry technical language. It appears to use the correct model when I translate a Word document, but it seems to ignore my custom model translation when I use a PDF. I'm guessing it's a compatibility issue. So, is there some way or settings I can correct on my PDF to make the translator preserve my format and use my custom model? (Note that saving a PDF as a Word document and translating the Word document is not an option. The formatting gets very messy).

Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
0 comments No comments
{count} votes

Accepted answer
  1. Chiugo Okpala 1,910 Reputation points MVP
    2025-06-13T05:12:15.66+00:00

    @Irene Hanning welcome to the Microsoft Q&A community.

    From your message, it seems like you're running into two separate issues: formatting preservation and custom model compatibility when translating PDFs in Azure Language Studio.

    For formatting preservation, the problem likely stems from how the translator handles text expansion—Spanish tends to take up more space than English, and the system tries to fit it within the original layout, sometimes shrinking fonts or misaligning text. While Azure Language Studio doesn’t have a built-in fix for this, you might try:

    • Using a PDF with flexible text boxes rather than fixed layouts.

    Adjusting the document structure to allow more space for translated text.

    Testing different PDF versions (e.g., PDF/A vs. standard PDF) to see if formatting is better preserved.

    For custom model compatibility, Azure’s Custom Translator should apply to both Word and PDF translations, but some users have reported inconsistencies. You might want to:

    Check if your custom model is correctly linked to the document translation feature.

    Verify that your PDF translation settings match those used for Word documents.

    Reach out to Azure support for confirmation on whether custom models are fully supported for PDFs.

    You can find more details on Azure’s translation capabilities here. If you need a deeper technical dive, you might check out discussions on Microsoft Q&A here.

    I hope these helps. Let me know if you have any further questions or need additional assistance.

    Also if these answers your query, do click the "Upvote" and click "Accept the answer" of which might be beneficial to other community members reading this thread.

    User's image

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.