A Python library for reading, creating, and updating Microsoft Word
2007+ (.docx) files.
This repository is a fork of python-docx
by Steve Canny. It builds on their original work by extending coverage
to 100+ additional OOXML features — footnotes and endnotes, tracked
changes, bookmarks, fields, content controls, charts, equations,
SmartArt, watermarks, digital signatures, accessibility tooling, and
cross-document operations. Forked at upstream 1.2.0 (2025-06-16).
Credit for the foundational library goes to the original author.
pip install git+https://github.com/loadfix/python-docx.git
Requires Python 3.9+.
Not yet published to PyPI. Install from source only.
from docx import Document
document = Document()
document.add_paragraph("It was a dark and stormy night.")
document.save("dark-and-stormy.docx")
document = Document("dark-and-stormy.docx")
print(document.paragraphs[0].text)
# It was a dark and stormy night.The package is imported as docx, matching upstream. Existing
upstream code runs unchanged against this fork.
See FEATURES.md for the full catalogue — 43 sections
covering every public capability, with fork additions marked
[Added in 2026.05.0].
Summary of areas extended beyond upstream 1.2.0:
- Footnotes, endnotes, and their numbering properties
- Tracked changes (read, accept, reject, insertions, deletions, moves, formatting changes, cell/row changes, revision IDs)
- Bookmarks (create, read, delete, cross-paragraph)
- Fields (simple, complex, REF/PAGEREF cross-references, DOCPROPERTY resolution, table of contents, list of figures/tables)
- Content controls (SDTs: rich text, plain text, date, checkbox, combo, dropdown, picture; custom XML data binding)
- Bibliography and citations (
Document.bibliography,Document.add_citation,Paragraph.add_citation_reference— backed by thecustomXml/item{N}.xml+itemProps{N}.xmlpart pair) - Form fields (text input, checkbox, dropdown)
- Cross-format linked content (
Paragraph.link_to(target_url)— Excel cells, Excel table columns, PowerPoint slides viaINCLUDETEXT;Document.update_links()re-resolves via siblingxlsx/pptx) - Charts (read + create for bar/line/pie;
Chart.replace_data()) - SmartArt (read + create for list/cycle/process layout families)
- Equations (OMML read + builders for identifier, fraction, superscript, subscript, radical)
- Watermarks, captions, ink annotations, embedded OLE objects, alt-chunks
- Tables (borders, shading, margins, autofit, merged-cell helpers, style
flags, caption/description, indent, row height, header rows,
cross-document copy, CRUD on rows/columns/cells,
Document.add_dataframestyled DataFrame import with optionalpandas) - Sections (page borders, line numbering, document grid, paper source, columns, text direction, odd/even and first-page header/footer, copy between sections)
- Images (PNG/JPEG/GIF/BMP/TIFF/SVG/WebP/EMF/WMF/EPS; linked, floating, outline, crop, opacity, shadow, alt text, delete, replace)
- Shapes (preset DrawingML shapes, text boxes, canvas)
- Numbering (custom definitions, restart, rendered list labels)
- Styles (cross-document import, builtin latent materialisation, document-default font, next-paragraph auto-apply)
- Fonts (cs size, character scale, ligatures, shading, borders, language, East Asian layout, symbols, ruby)
- Accessibility (alt text, heading-structure validation)
- Search and replace (plain, regex, across tables/headers/footers/footnotes)
- CSS-selector queries (
Document.select/Document.select_one— paragraphs, runs, tables, hyperlinks, bookmarks, comments by attribute / combinator / pseudo-class) - Cross-document operations (
append_document,add_table_copy,copy_header_from) - Semantic diff (
Document.diff(other), three granularity levels, Markdown / HTML / Word output formats — review-friendly compare for PR workflows) - Packaging (
.dotx/.dotmtemplates, Strict OOXML translation, Flat-OPC read/write, reproducible save,huge_treeopt-in, recover mode,Document.repair()best-effort recovery for damaged packages, password-protected read/write via optionalpython-ooxml-crypto,Document.stream()bounded-memory reader for very large documents,Document.from_html()/from_html_string()HTML import,os.PathLikesupport) - Exporters (
Document.to_html()minimal HTML5,Document.to_markdown()GitHub-Flavoured Markdown,Document.save_as_pdf_a(path, level="3a")best-effort PDF/A archival export with XMPpdfaidmetadata — opt-in viapip install 'python-docx[pdfa]') - Settings and metadata (compat flags, view, mail merge,
Document.extended_properties, doc vars, page stats, spell/grammar toggles, auto-hyphenation, timezone-aware comments) - Themes, web settings, font table (with font embedding), glossary, digital-signature detection
- High-level authoring helpers under
docx.kit— pattern-level compositions over the primitive APIs. Shipsdocx.kit.front_matter(title page, copyright page, dedication, preface, table of contents, list of figures, list of tables),docx.kit.chapter.add_chapter_opener(section break + Heading 1 title- epigraph + decorative image + drop cap),
docx.kit.dividers(add_divider/add_fleuron/add_three_stars/add_chapter_breakfor section dividers and chapter ornaments — fleurons, three-stars, dashed/dotted/wave/line breaks),docx.kit.letterhead.set_letterhead(branded header + footer with three styles),docx.kit.resume(resume_chronological/resume_functional/resume_technicalfactories returning fully-styled CV documents in three visual styles —modern/classic/minimal),docx.kit.mail_merge.merge(bulk-render N personalised documents from a single template + iterable of records),docx.kit.contracts(nda/msa/sow/contractor_agreementboilerplate factories — starting points only, not legal advice),docx.kit.invoices(invoice/quote/statementfactories with AUS GST defaults — 10% GST, override per-line viagst_rate=0for international callers, auto-computed subtotal / GST / grand total, right-aligned line-item table; output complies with ATO tax-invoice rules when the seller carries an ABN),docx.kit.memos(investment_memowith McKinsey-style SCQA executive summary, andbusiness_casewith options-analysis table),docx.kit.templates(brief/coe/rfp_response/white_paperdocument-template registry covering short briefs, Centre of Excellence charters, RFP responses with a pricing table, and white papers with abstract and references),docx.kit.scientific(ieee_paper/acm_paper/apa_paper/nature_paperscientific-paper template factories — IEEE / Nature switch the body to two-column layout, APA applies double line spacing, ACM stays single-column for theacmartstylesheet),docx.kit.legal(court_paper/brief/declaration/table_of_authoritieslegal industry template factories with Federal Court of Australia / NSW Supreme Court front-sheet layout, Word built-in line numbering viaw:sectPr/w:lnNumType, and a liveTOAcomplex field — starting points only, not legal advice), anddocx.kit.medical(soap_note/discharge_summary/referral_letterclinical-note template factories with Subjective / Objective / Assessment / Plan structure and a structured vitals table — template only, not a medical record),docx.kit.coe(coe(doc, ...)— Correction of Error / post-mortem template that appends an incident metadata block, summary, timeline table, Five Whys table, contributing-factors list, action-items table, and lessons-learned list to an existing document), anddocx.kit.brand(BrandAssets.load(yaml_path)— YAML-driven manifest loader for brand colours, font pairs, logo path variants, and conventional spacing values; composes withset_letterhead,add_chapter_opener, and the rest of the kit so an organisation declares its brand once and reuses it everywhere). Lives under the optional[kit]extras flag (pip install python-docx[kit]);BrandAssets.loadadditionally needs PyYAML, which the optional[brand]extras pulls in (pip install 'python-docx[brand]'). (validate_brandbrand-guideline linter that walks a document and returnsBrandFindingrecords covering font / colour / logo / heading-style / spacing drift against a YAML, dict, orBrandAssets-shaped palette). Lives under the optional[kit]extras flag (pip install python-docx[kit]).
- epigraph + decorative image + drop cap),
API and user-guide documentation lives under docs/ and builds with
Sphinx. The theme is Furo.
pip install Sphinx furo
python -m sphinx -b html docs docs/_build/html
Document.save(path, reproducible=True) produces a byte-identical
.docx for byte-identical inputs across machines and runs:
from docx import Document
doc = Document()
doc.add_paragraph("Hello")
doc.save("out.docx", reproducible=True)The flag stamps every zip-member with the fixed 1980-01-01 timestamp,
emits members in sorted order, normalises external file attributes,
and disables the rsid-family churn attributes that Word otherwise
mints on every save — the four sources of cross-machine and cross-
session nondeterminism. Use it for source-control-friendly diffs,
fixture regeneration, and content-addressable artefact pipelines.
The matching keyword is also accepted by the sibling python-pptx,
python-xlsx, and python-vsdx parents so cross-format build
pipelines share a single idiom (issue #150).
Document.save(path, compatibility="Word 2003") opts into Word's
"compatibility mode" by stamping compatibilityMode on
settings.xml/w:compat and best-effort filtering features the older
client cannot render:
from docx import Document
doc = Document()
doc.add_paragraph("Compatible with Word 2003.")
doc.save("legacy.docx", compatibility="Word 2003") # val=11
doc.save("modern.docx", compatibility="Word 2016") # val=16Accepted labels: "Word 2003" → 11, "Word 2007" → 12, "Word 2010"
→ 14, "Word 2013" → 15, "Word 2016" → 16 (raw ints are also
accepted). Targeting Word 2003 / 2007 strips the modern threaded-
comments parts (commentsIds.xml, commentsExtensible.xml,
commentsExtended.xml); newer targets only write the
compatibilityMode setting. The flag is best-effort: it tells
Word to open the file as if it had been authored under the older
release, but features the older client cannot render (SmartArt, OMML
equations, content controls, …) are left in the package and may show
as placeholders. Closes #94.
A central design goal of this fork is round-trip fidelity — load a
real-world .docx, mutate a few elements, save, and have nothing else
change. Charts, comments, custom XML parts, math, ink, signatures,
bibliography, and the rest of the loadfix-extended feature surface
must all survive.
The cross-monorepo round-trip gate lives at
tests/round_trip/ and runs as the
round-trip-fidelity CI job. The full per-feature support matrix
(what's "fully preserved" / "preserved with caveats" / "lossy")
across all four parent formats lives at
docs/round-trip-fidelity.md.
Unstable. Not yet published to PyPI. Current version: 2026.05.10
(first release as an independent fork). Versioning is CalVer
(YYYY.MM.patch). Public API tracks upstream 1.2.0 for the
inherited surface; fork additions are considered experimental until
the next calendar release.
Issues and pull requests are tracked at https://github.com/loadfix/python-docx/issues. Please file issues against this fork; upstream's tracker is for upstream-shared concerns only.
When contributing:
- Run the tests:
pytest tests/ -qanduv run behave features/. - Keep
FEATURES.mdcurrent when adding, modifying, or removing public API (seeCLAUDE.mdfor contributor conventions). - Consult
spec/(XSD schemas and the ISO/IEC 29500 PDFs) for authoritative element ordering and cardinality when implementing newCT_*classes.
MIT. See LICENSE. Inherited from upstream python-openxml/python-docx.
Part of a family of document-rendering libraries:
- docxjs — browser-side DOCX → HTML renderer (TypeScript)
- pptxjs — browser-side PPTX → HTML renderer (TypeScript)
- xlsxjs — browser-side XLSX → HTML renderer (TypeScript)
- python-pptx — Python PPTX parser/generator
- python-xlsx — Python XLSX parser/generator
- ooxml-validate — Python/.NET OOXML validator (wraps Microsoft Open XML SDK + LibreOffice)