In-Place PDF Engine

Regulated Document DTP & PDF Rebuild

We rebuild regulated multilingual documents in-place: tax forms, clinical patient leaflets, patent filings, annual reports and CE safety packs. In plain terms — your translated document comes back looking identical to the original, in any language, even when there is no editable source file behind it. Our PDF engine extracts text at source coordinates — including the outlined vector glyphs that defeat OCR-only vendors — and writes the translation back at the exact location. Native Arabic, Persian, Urdu, Pashto, CJK and Indic typesetting handled in-house by trained DTP operators. ISO 17100 process, per-span audit trail, delivered print-ready.

ISO 17100 + GDPR Cardiff, UK

Last updated

See a sample Multilingual brochure rendered in 5 languages — same content, native typography in each script. Download PDF · PDF · 501 KB
What we do

Specialist capabilities for Regulated Document DTP & PDF Rebuild

In-Place PDF Rebuild

Source-coordinate text replacement for complex PDFs. We translate tax returns, clinical forms, and regulatory documents without destroying the original layout — handling outlined vector glyphs that OCR-only vendors silently drop.

Regulatory Forms Specialism

French impots.gouv.fr, German CE safety, Spanish AEMPS, Italian Agenzia Entrate, UK HMRC templates. We have mapped how each authority stores text and what breaks when lesser tools try to translate it.

Clinical & Pharma Packs

Patient Information Leaflets (PIL), SmPC, Investigator Brochures, Informed Consent Forms — formatted to EMA, MHRA, and FDA template requirements with full audit trail.

Legal & Patent Filings

Foreign filing applications, patent claims, contract bundles, court exhibits. Layout-perfect translations ready for court submission or IP registration in 40+ jurisdictions.

Corporate Compliance Reports

Annual reports, sustainability filings, investor decks — formatted to LSE, AIM, and target-country disclosure rules. Pantone-matched, CMYK-ready, prepress-clean.

Brand-Locked Marketing Collateral

InDesign, Illustrator, Photoshop workflows with RTL mirroring, CJK vertical text, text-expansion handling. For when a rebrand lands in 12 languages the same day.

01 Deep dive · 01 / 06

In-place PDF rebuild — why this is the hardest job in DTP

The hardest DTP job is the one where there is no source file. A French tax return, a German CE safety leaflet, a clinical patient information leaflet (PIL), a patent filing — all of these are PDFs the client received from a regulator or an agency, with no editable InDesign or Word original behind them. Most translation agencies handle this by extracting the text into a Word document, translating, and sending the translation back as a fresh document that does not match the regulator's template. This is unacceptable for filings — the regulator expects the original form back, in the target language, looking exactly like the original.

01

The technical problem

PDFs do not store text as text. They store text as a sequence of glyph operations: "draw character 67 from font Helvetica at coordinate (123, 456)". Most agencies use a PDF-to-Word converter that loses the coordinate information and the font references; the result is editable but visually unrelated to the source. The next level of agency uses Adobe Acrobat's "Export to InDesign" — this preserves more layout but breaks on three categories of content:

  • Outlined glyphs. Many regulatory forms (impots.gouv.fr, German Steuererklärung, AEMPS) outline their text labels to a vector path at publication. Acrobat's extractor reads outlined text as a graphic, not a string. The translator never sees the field label and the regulator never sees its template back.
  • Form fields. AcroForm and XFA fields have their own text and font references. Extracting the surrounding text but not the field labels produces a half-translated document.
  • Scanned hybrids. A PDF that is part vector, part raster (typical for regulator-supplied documents with a scanned signature block) needs OCR on the raster portion and coordinate-preserved extraction on the vector portion. A single tool cannot handle both.
02

Our engine

We built our PDF engine specifically for this category. It enumerates every text-drawing operation in the PDF, including outlined glyphs (by detecting common label fonts and reverse-mapping the path back to characters), reads form field labels and tooltips, runs OCR over raster regions, and produces a per-span translation manifest. The translator works in the manifest; the engine writes the translation back at the source coordinates. Form fields are preserved; the regulator sees the same form, in their target language, with the right field labels.

03

What this delivers

A French tax return that looks like a French tax return. A German CE leaflet that matches the German regulator's template. A patient information leaflet that the EMA accepts on first review. We were built for this; most agencies were not.

02 Deep dive · 02 / 06

RTL typesetting — Arabic, Persian, Urdu, Pashto

RTL typesetting is the single biggest skill gap in UK DTP. Most agencies own InDesign but do not have the Middle East and North Africa Composer enabled, do not have an Arabic typesetter on staff, and outsource the work to a Cairo or Beirut studio. The round-trip introduces latency, version drift, and a translator-typesetter handoff that frequently breaks. We do this work in-house, in Cardiff, with native Arabic typesetters who have shipped this work for over a decade.

01

The Adobe World-Ready Composer

InDesign ships a default paragraph composer that cannot handle Arabic correctly — it breaks ligatures, gets the line-break logic wrong, and fails to apply contextual forms. The fix is to enable Adobe World-Ready Paragraph Composer (Type → Paragraph → World-Ready Paragraph Composer in the InDesign ME edition, or via a script in the standard edition). Most agencies running standard InDesign do not enable it and ship visually broken Arabic that a native reader will reject on sight.

02

Ligatures and contextual forms

Arabic glyphs change shape depending on whether they are isolated, initial, medial, or final in a word. A correctly typeset Arabic paragraph applies the correct form automatically through the OpenType init, medi, fina, and isol features. Fonts vary in how well they implement these features — Tajawal, Cairo, Noto Sans Arabic and Adobe Arabic are reliable. Many "Arabic" fonts on free repositories are not, and produce text that a native reader recognises as foreign-made on first glance.

03

Bidirectional flow in mixed content

An Arabic paragraph that contains an English brand name or a number flows in both directions. The Unicode Bidirectional Algorithm handles most cases, but punctuation at the boundary (a full stop after an English brand inside an Arabic sentence) is often placed on the wrong side of the brand. We use Unicode directional formatting marks (RLM, LRM) to anchor punctuation correctly, and we proof on the final layout — not in the InDesign editor, where the algorithm's behaviour differs from the exported PDF.

04

Numerals: Arabic-Indic vs Western

Arabic content can use Arabic-Indic numerals (٠١٢٣٤٥٦٧٨٩) or Western numerals (0123456789); the choice depends on regional convention and client house style. We confirm before commencing and configure the InDesign character style to enforce consistency throughout the document. A document that mixes both is a regional bug.

05

Persian and Urdu

Persian and Urdu share the Arabic script but have different glyph shapes, additional characters (Persian's پ, Urdu's ٹ), and different typographic conventions. Nastaliq is the dominant Urdu typeface; Naskh is dominant for Persian. Treating all three as "Arabic" is the most common error; we staff per-language typesetters.

03 Deep dive · 03 / 06

CJK, Thai and Indic typesetting

Asian scripts each have a separate set of typographic requirements that Latin-trained typesetters routinely miss. We handle each in-house with operators trained on the script.

01

Chinese (Simplified and Traditional)

Chinese has no word spaces; line breaks can happen at any character boundary. InDesign's CJK composer handles this correctly when enabled. Punctuation kinsoku rules — Chinese full-width punctuation cannot appear at the start of a line — need explicit configuration. Vertical typesetting (for traditional contexts, books, and some marketing layouts) requires a separate composer and frame setup. Simplified Chinese uses one set of punctuation conventions and PRC orthography; Traditional Chinese uses another and we proof against Taiwan or Hong Kong house style depending on target.

02

Japanese

Japanese mixes three scripts (kanji, hiragana, katakana) with Latin characters and numbers. Ruby text (furigana — small phonetic readings above kanji) is required for some content categories (school materials, accessibility). Vertical typesetting is common in traditional contexts. Line-break rules are more permissive than Chinese but still have kinsoku constraints.

03

Korean

Korean Hangul uses word spaces (unlike Chinese and Japanese). Justification and tracking behave differently from Latin scripts; Korean readers expect tighter line spacing than the Latin default. Hanja (Chinese characters in Korean contexts) appear in academic and legal documents and require font fallback handling.

04

Thai

Thai is the script that breaks naive typesetters most often. There are no word spaces — line breaks must be inferred from a dictionary. InDesign's Thai composer uses a built-in dictionary; for technical terms not in the dictionary, manual zero-width-space insertion is required. Tone marks stack above consonants and cannot wrap to the next line. Most UK typesetters do not have a Thai operator on staff; we do.

05

Indic scripts: Devanagari, Bengali, Tamil, Telugu, Gujarati, Punjabi, Malayalam

Indic scripts use complex shaping — conjunct consonants combine into ligatures that depend on the consonant cluster and the active font's OpenType tables. A naïve typesetter will produce visually broken Devanagari where conjuncts fail to form. Noto Sans Devanagari, Adobe's Devanagari Sangam MN, and a small set of others handle the shaping correctly. We test on the final font, not on a fallback. For technical documents we additionally test on the regional fonts that government regulators expect (e.g., Mangal for some Indian government forms).

04 Deep dive · 04 / 06

Text expansion, frame reflow and the German problem

Translation changes word count. German averages 30-40% longer than English. Arabic averages 25% longer. Russian averages 15-20% longer. Chinese is typically shorter. A naïve typesetter ships the German version with truncated buttons, overflowing text frames, and broken page breaks; a regulator rejects the document on first review.

01

Our approach to expansion

  • Frame auto-sizing — every text frame in the source InDesign file is set to auto-size to its content. The frame grows; the layout adapts. This is the single biggest source of expansion-related bugs and the simplest to fix.
  • Style-sheet expansion budget — we publish a style sheet for each target language showing the maximum allowable expansion before manual reflow. Translators see the budget while translating; if a string overruns, they can request a reword or flag the constraint.
  • Reflow rules — we agree per-document reflow rules at scoping. Common: page breaks may shift; section headings may not split; tables may overflow to the next page; figure captions stay with their figure. The rules are documented and applied consistently.
  • Page-count change report — at delivery we publish a page-count comparison: source 40 pages, German 48 pages, Arabic 47 pages, Chinese 36 pages. The client knows what they're getting before they open the file.
02

The pharma case

Pharma patient information leaflets are the high-stakes case. The EMA and MHRA require specific section headings, in a specific order, with no truncation. German expansion routinely pushes a 4-page PIL to 6 pages; the regulator accepts the page-count change as long as section integrity is preserved. We work to the EMA's QRD template for every PIL and proof against the current template version (the template changes; we monitor).

05 Deep dive · 05 / 06

Print specifications, colour management and PDF/X

A document that will be printed needs to leave us as print-ready as your printer expects. This is a separate skillset from translation and most translation agencies cannot do it; we have an in-house prepress operator and we deliver to spec.

01

Print specifications we support

  • PDF/X-1a:2001 — the legacy standard, CMYK only, all fonts embedded. Still required by some printers and most newspaper systems.
  • PDF/X-4 — modern standard, supports transparency natively, RGB and CMYK, ICC colour profiles. Our default for new work; supported by most modern presses.
  • PDF/A — for long-term archive (often required for regulator filings). PDF/A-1b is the conservative choice; PDF/A-2 and -3 add features.
  • Marks and bleed — printer's marks, registration marks, colour bars and 3mm bleed by default; configurable per print contract.
02

Colour management

CMYK conversion needs to know the target press: Fogra 39 / 51 for European coated stocks, GRACoL 2006 for North American, JC2011 for Japanese. We embed the correct ICC profile and proof against soft-proofing if the printer supplies a profile. Pantone spot colours are preserved as spots through to plate-making.

03

Font embedding and licensing

Every font used must be embedded in the final PDF (PDF/X requires it). Foundry licences vary in what they permit — Monotype, Linotype and Adobe Originals licences are explicit; some commercial fonts forbid embedding and need replacement before export. We audit the font stack and substitute where licensing requires.

04

Handover to your printer

We deliver the InDesign package (or the print-ready PDF, depending on scope), a Print Spec sheet documenting the bleed, marks, colour profile and stock recommendation, and a soft-proof PDF for visual sign-off. Your printer should be able to plate the file without back-and-forth.

06 Deep dive · 06 / 06

Multilingual desktop publishing services for regulated industries

Most of our DTP work is regulated content: pharma, legal, financial, government. Each category has its own deliverable standards and our process is built around them.

01

Pharma: PIL, SmPC, IB, ICF

Patient Information Leaflets and Summary of Product Characteristics follow the EMA QRD template; Investigator Brochures follow ICH E6 guidance; Informed Consent Forms follow the IRB or REC template. We work to the current template version (templates revise; we monitor) and deliver against the regulator's submission format (XML for SmPC, PDF for PIL, eCTD for full submissions). Per-span audit trails are mandatory for regulated pharma work; we publish them.

02

Patent and IP: PCT, EPO, USPTO

Patent filings have format requirements that differ per office. EPO accepts XML and PDF; USPTO requires specific font sizes and margins; PCT submissions have language-specific abstracts. We work to the office's current specification and deliver to that. Drawings are translated separately — figure callouts in the target language with the same numbering scheme.

03

Legal: court bundles, arbitration packs

UK court submissions need a specific bundle format (single PDF, paginated, indexed, hyperlinked). We rebuild bundles in target language with the same pagination and index structure. For international arbitration we work to the procedural order's specification, which varies per case.

04

Financial: annual report, prospectus, KID

Annual reports follow each market's listing rules (LSE, AIM, NASDAQ, Euronext) for required sections; prospectuses follow the prospectus regulation; KIDs follow PRIIPs. We coordinate with your financial translation team (often us — see financial translation) to ensure prose, numbers, and layout are in lock-step.

05

Government: tax, immigration, civil registration

The hardest category for in-place PDF rebuild because forms vary across local offices. We've handled French impots.gouv.fr, German Steuererklärung (multiple Bundesländer), Spanish AEMPS pharmacovigilance forms, Italian Agenzia Entrate, and several UK HMRC forms. Each office's PDF storage convention is documented in our internal library; we know what to expect before we start.

How it works

Our process, end to end

  1. 1

    1. Source Analysis & Risk Scan

    We run your PDF through our forensic extractor to identify text spans, outlined glyphs, embedded fonts, form fields, scanned regions, and regulatory template references. Before quoting, we already know what would break with a cheaper vendor.

  2. 2

    2. Specialist Assignment & Routing

    We assign a linguist with subject expertise — pharma for clinical, IP lawyer for patents, sworn translator for court bundles — and route content blocks through our LingoSecure workspace for tracked, audited delivery.

  3. 3

    3. In-Place Translation

    Our engine writes translations back at exact source coordinates. Text expansion (German +30%, Arabic +25%) handled via frame auto-sizing and reflow rules that preserve visual balance.

  4. 4

    4. RTL & Complex-Script Formatting

    For Arabic, Persian, Urdu, Pashto: mirrored layouts, bidi-correct mixed content, native-device rendering verification. For CJK: vertical text support, ruby text, character-correct line breaking.

  5. 5

    5. Triple QA — Linguistic, Visual, Regulatory

    Second linguist signoff on meaning. Side-by-side visual comparison against source. Regulatory template check against the target authority's current specification, updated quarterly.

  6. 6

    6. Audit-Ready Delivery

    Print-ready PDFs (CMYK, PDF/X-1a or X-4), packaged source files, full translation memory export, and a per-span audit log you can hand to any regulator. Delivered via LingoSecure portal with encrypted download.

Who we serve

Built for these teams

Pharma & Life Sciences

  • Patient Information Leaflets (PIL)
  • SmPC and investigator brochures
  • Clinical trial protocols and ICFs
  • EMA / MHRA / FDA submissions
  • Medical device IFUs
  • Pharmacovigilance reports

Legal & IP

  • Foreign filing patent applications
  • Court exhibit bundles
  • International arbitration documents
  • Cross-border M&A disclosure packs
  • Trademark and design filings
  • Sworn document rebuilds

Financial & Corporate

  • Annual and sustainability reports
  • Investor presentations (LSE/AIM)
  • Fund prospectuses and KIDs
  • Tax return and filing packs
  • Corporate governance documents
  • Regulatory compliance bundles

Manufacturing & Engineering

  • Technical manuals (200+ pages)
  • CE / UL safety documentation
  • Assembly instructions and diagrams
  • Spare parts catalogues
  • RFP and tender responses
  • Maintenance and service guides
Why teams pick us

What you get with Lingo Service

We translate documents other vendors send back unformatted

Our PDF engine handles outlined vector glyphs, scan-then-print-then-scan hybrids, and forms storing labels as Arial-BoldMT outlines. This is why French tax returns usually come back broken — and why ours come back right.

Deadline is the deadline

Regulatory submissions don't move. We schedule backwards from your filing date with built-in linguistic and visual QA buffer, and we log every milestone in our LingoSecure portal.

Full audit trail

Every translation span is logged — who translated it, when, from what source, which engine assisted. When a regulator queries a phrase, we have the receipt.

RTL is our home pitch

Arabic, Persian, Urdu, Pashto — we don't subcontract this. Mirrored layouts, bidi-correct mixed content, native device rendering checks. Our #1 language pair is EN↔AR.

FAQ

Frequently asked questions

Another vendor returned my French tax return or regulatory form in a broken layout. Can you fix it?
Yes. Send us the original and the broken output. We rebuild in-place from the source coordinates and return a PDF that matches the original template — including outlined vector glyphs that most UK agencies' tools silently drop. This is work others refuse because their workflows can't handle it; our engine was built for exactly this.
How is this different from ordinary DTP or translation?
Ordinary DTP needs editable source files (InDesign, Word) and rebuilds the layout manually — slow, expensive, error-prone for 200-page regulatory documents. Ordinary translation returns unformatted Word and leaves layout to you. Our approach works directly on the PDF, preserves coordinates, and handles form fields, outlined glyphs, and mixed scanned/vector content. Same or better quality, a fraction of the PM time.
Do you handle clinical trial documents for EMA, MHRA, and FDA submissions?
Yes. Patient Information Leaflets, SmPC, Investigator Brochures, ICFs, and protocol translations — formatted to the authority's current template, delivered with a per-span audit log, via our ISO 17100 and GDPR-compliant LingoSecure portal. We work with pharma sponsors and CROs direct.
Can you handle right-to-left languages for regulated documents?
Yes — Arabic, Persian (Farsi), Urdu and Pashto are our strongest specialism. We mirror layouts, handle bidirectional text (English numbers inside Arabic paragraphs), and verify font rendering on target devices. Our sister brand arabictranslation.co.uk has run these workflows since 2012 — nobody in the UK ships more Arabic layout work than we do.
How fast can you turn around a regulatory submission?
Timeline scales with complexity, not page count. A 40-page PIL in one language: 48 hours. A 200-page annual report in five languages: 10–14 working days. Tax returns with forms and outlined glyphs: 3–5 working days. Rush service available for regulatory filing deadlines — tell us the deadline first and we'll confirm what's feasible before quoting.
What file formats do you work with?
PDFs (including scanned, hybrid, and form-based), Adobe InDesign (.indd), Illustrator (.ai), Photoshop (.psd), Microsoft Word, PowerPoint, Excel, and regulatory XML (eCTD). Our in-place PDF engine handles the hard cases where no source file exists.
Will every translation be reviewed by a human?
Yes. Every span passes under a qualified linguist with subject expertise before delivery. Our LingoSecure workspace accelerates the draft so the linguist's time goes to verifying technical terminology, regulatory phrasing, and layout-critical formatting. No unreviewed content ever leaves the portal.
How do you handle confidentiality for sensitive documents?
All files move through LingoSecure — our ISO 17100 and GDPR-compliant portal with AES-256 encryption at rest, TLS in transit, per-user audit logs, team glossaries, IP allowlisting on request, and optional bring-your-own-key (BYOK) for enterprise. We can execute a DPA before any file is uploaded.
Can you integrate with our translation memory or terminology database?
Yes. We work with your existing TMS (Phrase, Lokalise, Crowdin, Smartcat, Memsource) or host your TM and glossaries inside LingoSecure. All translations are logged segment-by-segment so your TM grows with every job. You keep the TM — it is yours.
Do you work directly with in-house teams or only through procurement?
Both. We have framework agreements with public sector and large enterprise clients, and we also work directly with clinical case managers, patent attorneys, CFOs, and marketing leads who need a regulatory-grade result without the six-month procurement dance. Book a LingoSecure demo and you can be running a trial project in under a week.

Ready to talk?

Upload your files for an instant quote, or book a 15-minute call with a UK-based PM to scope a programme.

ISO 17100 accredited. GDPR-compliant. Based in Cardiff, UK.

Get Instant Quote
Lingo Pro

Lingo Pro

Online

Hey! I'm Lingo Pro. Ask me anything about translations, pricing, or turnaround times - I speak many languages!