Skip to content

loadfix/python-xlsx

 
 

Repository files navigation

python-xlsx

A Python library for reading, creating, and updating Microsoft Excel 2007+ (.xlsx / .xlsm) files.

This repository is a fork of openpyexcel by the Sciris team, itself a snapshot of openpyxl 2.5.14 (circa 2019) by Eric Gazoni, Charlie Clark, and contributors. It builds on their original work with modern Excel 365 capabilities — round-trip preservation of unknown parts and <extLst> extensions, dynamic-array spill metadata, _xlfn / _xlws / _xlpm formula-prefix handling for post-2010 functions, modern error codes, rich-text cells, pivot builder, chartEx 2013+ chart types, first-class sparklines, shapes, threaded comments, SHA-512 password hashing, optional agile encryption, and a src/xlsx/ rename. Credit for the foundational library goes to the original authors.

Installation

pip install git+https://github.com/loadfix/python-xlsx.git

Optional extras:

pip install "git+https://github.com/loadfix/python-xlsx.git#egg=python-xlsx[encryption]"

Requires Python 3.10+. Not yet published to PyPI; install from source only.

Usage

from xlsx import Workbook, load_workbook

wb = Workbook()
ws = wb.active
ws["A1"] = "It was a dark and stormy night."
ws.append([1, 2, 3])
wb.save("dark-and-stormy.xlsx")

wb = load_workbook("dark-and-stormy.xlsx")
print(wb.active["A1"].value)
# It was a dark and stormy night.

The package is imported as xlsx, matching the sibling series (docx, pptx). Code written against the fork-of-fork's openpyexcel import name must migrate to xlsx.

Streaming writes for huge workbooks

For workbooks with millions of rows the eager Workbook.save() path materialises the full sheet tree before serialising and OOMs past ~100k rows. Use Workbook.stream(path) instead — rows are emitted straight into the zip as they arrive, peak memory stays bounded regardless of cardinality:

from xlsx import Workbook

with Workbook.stream("huge.xlsx") as wb:
    ws = wb.add_sheet("data")
    ws.set_column_widths({1: 12, 2: 30})
    ws.freeze_panes(top_row=2)
    ws.append_row(["ID", "Name", "Revenue"])  # header
    for record in iter_records():            # millions of rows
        ws.append_row([record.id, record.name, record.revenue])

The streaming path supports plain values (str, int, float, bool, datetime/date, None), implicit shared-strings dedupe (capped at ~10k entries; spill to inline strings beyond), pre-row column widths, frozen panes, and a header marker. It does not emit charts, drawings, conditional formatting, pivots, data validations, or formulas — use the eager Workbook.save() path when those features matter. Mutators called after the first append_row raise xlsx.StreamingFrozenError.

See xlsx.writer.streaming for the full contract.

API

See FEATURES.md for the full public-API catalogue. API and user-guide documentation lives under docs/ (Sphinx, furo theme); it dates to the openpyxl 2.5 era and is being retooled for this fork — CLAUDE.md at the repo root is the canonical orientation document for contributors until a fresh docs site lands.

Summary of areas extended beyond the 2019-era openpyxl fork point:

  • Round-trip preservation — zip-level pass-through of unknown xlsx parts + <extLst> raw-bytes preservation across ~100 element types. Sparklines, slicers, timelines, rich data types, threaded comments, chartEx, ribbon customisations, and custom XML parts survive load / save even when the library has no typed model for them.
  • Dynamic arrays — first-class ArrayFormula with cm="1" / t="array" spill metadata, synthesised xl/metadata.xml part, and the spill operator A1# in the tokenizer.
  • Modern formula prefixes — transparent _xlfn. / _xlfn._xlws. / _xlpm. handling for ~150 post-2010 functions (XLOOKUP, FILTER, UNIQUE, SORT, LAMBDA, LET, ...), including LAMBDA parameter scope.
  • Modern error codes#SPILL!, #CALC!, #FIELD!, #BLOCKED!, #CONNECT!, #BUSY!, #UNKNOWN!, #EXTERNAL!.
  • CellsCellRichText for multi-run mixed-formatting values, read and write via the shared-strings table.
  • TablesWorksheet.add_table() reads the header row, seeds TableColumn names, wires totals-row aggregate helpers (sum/avg/count/max/min/etc.), supports calculated-column formulas. Worksheet.add_table_from_dataframe(df, name, location, …) is the pandas-native companion: writes the DataFrame, registers a real <table> part, and accepts a total_row_aggs map (sum/mean/count/ max/min — pandas spelling), per-column number_formats (the currency/pct2/raw-Excel-code DSL), and column_widths="auto" via the autofit helper. Pandas is an optional dep — install with pip install python-xlsx[dataframe].
  • PivotsWorkbook.create_pivot(source, target, rows=, cols=, values=, filters=) builds the pivot cache, cache records, and pivot table definition from a source range.
  • Power Pivot data modelwb.data_model.add_table(name, source=), add_relationship(from_table=, from_column=, to_table=, to_column=), and add_measure(name=, table=, expression=, format=) author a tabular data model (DAX expressions persisted verbatim) at xl/model/data-model.xml. Round-trips cleanly through python-xlsx; Excel-authored Power Pivot payloads continue to pass through byte-for-byte.
  • Power Query authoringwb.add_query(name, source, transform_steps=, refresh_on_open=) emits an xl/connections.xml entry plus a <DataMashup> customXml sidecar carrying the M-language source. Eight transform-step kinds (parse_csv / parse_json / rename_columns / filter / group_by / join / sort / limit); URL / file / named-range source bindings. Issue #104.
  • Filters and sort — rich AutoFilter helpers (add_top10, add_dynamic_filter, add_custom_filter, add_color_filter, add_icon_filter, add_date_group_filter) and SortState helpers (sort_by_value, sort_by_color, sort_by_icon).
  • Conditional formatting — modern data-bar options (negativeFillColor, negativeBorderColor, axisColor, axisPosition, gradient, direction, border) and custom icon-set icons; x14 extension round-trip. ws.conditional_format(range, kind=...) wizard with twelve+ named kinds ("heatmap", "data_bars", "icon_set", "top_n"/"bottom_n", "above_avg"/ "below_avg", "duplicates"/"unique", "greater_than"/ "less_than"/"equal_to", "text_contains", "formula") and a semantic colour palette ("success"/"warning"/"danger"/"primary"/ "accent").
  • Charts — classic 2007 chart family preserved. Modern chartEx types for Excel 2013+ (treemap, sunburst, funnel, waterfall, box-whisker, histogram, regionMap); drawing anchors round-trip.
  • Sparklines — first-class Sparkline / SparklineGroup model with ws.add_sparkline(location, data, type=); line / column / stacked (win-loss) types.
  • Drawings — first-class Shape with preset geometry (180+ presets), fill, line, and text; ws.add_shape() creation API; grouped-shape / connector / SmartArt / form-control / content-part preservation.
  • Comments — threaded comments (ThreadedComment, PersonList, replies, resolved state) alongside the legacy Comment API. Ergonomic authoring: ws['B2'].comment(text, author=...) for legacy notes, ws['B3'].comment_thread().add(text, author=...).reply(...) for Excel-365 threads, plus ws.comment_threads() iteration.
  • Protection — SHA-512 password hashing (set_password(..., sha512=True)) matching Excel 2013+ output; legacy 16-bit hash retained for back-compat.
  • Encryption — load and save password-protected .xlsx via python-ooxml-crypto (the [encryption] extra).
  • Column sizingws.autofit_columns() measures displayed values with Pillow font metrics.
  • Document propertiesdocProps/app.xml round-trips verbatim; docProps/custom.xml custom properties preserved.
  • Formula audit / lintwb.audit_formulas(rules=None) returns a list of FormulaFinding (severity / sheet / cell_ref / rule_id / message). Seven shipped rules: broken-ref, circular-ref, range-extends-past-data, volatile-function, inconsistent-formula, unused-named-range, external-link-stale. Sister of Worksheet.audit_styles() and the Document.lint() / Page.lint() siblings in python-docx / python-vsdx. CLI hook: python -m xlsx audit-formulas <file.xlsx>.
  • Streaming readerWorkbook.stream_read(path) returns a bounded-memory StreamingWorkbook context manager. Sheets expose a forward-only rows generator that yields tuples of cell values (str / int / float / bool / None); the underlying lxml.etree.iterparse clears each <row> element after yielding so resident memory stays flat regardless of row count. Use this when you need to scan a workbook whose row count would dwarf the eager loader (~1M+ rows). Cells are read-only — set_value raises StreamingNotMutableError; for editing, formulas, styles, comments, merged cells, or hyperlinks, use the eager load_workbook.
  • Semantic diffold.diff(new, level="content") returns a WorkbookDiff describing how two workbooks differ. Three levels: "structural" (sheet add/remove only), "content" (per-cell value
    • formula diffs, default), "formatting" (adds per-cell number- format / font / fill / border / alignment / protection diffs). Renderers: .summary (counts dict), .changes (ordered Change records), .to_markdown() (PR-comment friendly), .to_html(), and .to_xlsx_with_changes_highlighted() (a third .xlsx with changed cells shaded by kind).
  • Python 3.10+ onlyopenpyexcel.compat Python-2 shim removed; package renamed to xlsx, moved to src/xlsx/ with a py.typed marker.

Reproducible builds

Workbook.save(path, reproducible=True) produces a byte-identical .xlsx for byte-identical inputs across machines and runs:

from xlsx import Workbook

wb = Workbook()
wb.active["A1"] = "Hello"
wb.save("out.xlsx", reproducible=True)

The flag stamps every zip-member with the fixed 1980-01-01 timestamp, emits members in sorted order, and normalises external file attributes — the three sources of cross-machine nondeterminism a plain zip_date_time= keyword does not cover. Use it for source- control-friendly diffs, fixture regeneration, and content-addressable artefact pipelines; use zip_date_time= directly when you want a custom timestamp without sort or attribute normalisation. The matching reproducible= keyword is also accepted by python-docx, python-pptx, and python-vsdx so cross-format build pipelines share the same idiom (issue #150).

Documentation

The Sphinx-built site is auto-deployed to https://loadfix.github.io/python-xlsx/ on every push to master.

To build locally:

pip install sphinx furo myst-parser
python -m sphinx -b html docs docs/_build/html
open docs/_build/html/index.html

Round-trip support

Editing real-world .xlsx workbooks (pivots, conditional formatting, charts, threaded comments, custom XML, signatures, encryption) without disturbing unrelated parts is a core fork goal. The local fidelity suite under tests/roundtrip/ exercises Excel-365 fixtures; the cross-monorepo OPC-level gate lives at tests/round_trip/ and runs as the round-trip-fidelity CI job.

The per-feature support matrix (what's "fully preserved" / "preserved with caveats" / "lossy") across all four parent formats lives at docs/round-trip-fidelity.md.

Status

Unstable. The public API is still being shaped as the fork modernises toward Excel-365 fidelity; expect incompatible changes before a 1.0 line. Not yet published to PyPI. Current version: 2026.05.0 (CalVer, YYYY.MM.patch).

Contributing

Issues and pull requests are tracked at https://github.com/loadfix/python-xlsx/issues. See CONTRIBUTING.md for workflow notes and CLAUDE.md for architectural orientation. Run the unit suite with .venv/bin/python -m pytest src/xlsx/ -q and the round-trip fidelity harness with .venv/bin/python -m pytest tests/roundtrip/ -v before opening a PR.

License

MIT. See LICENCE.md. Inherited from openpyxl via openpyexcel.

Related projects

Part of a family of document-rendering libraries:

  • docxjs — browser-side DOCX → HTML renderer (TypeScript)
  • pptxjs — browser-side PPTX → HTML renderer (TypeScript)
  • xlsxjs — browser-side XLSX → HTML renderer (TypeScript)
  • python-docx — Python DOCX parser/generator
  • python-pptx — Python PPTX parser/generator
  • ooxml-validate — Python/.NET OOXML validator (wraps Microsoft Open XML SDK + LibreOffice)

About

Create and modify Excel workbooks with Python

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 100.0%