A Python library for reading, creating, and updating Microsoft Excel
2007+ (.xlsx / .xlsm) files.
This repository is a fork of openpyexcel
by the Sciris team, itself a snapshot of
openpyxl 2.5.14 (circa 2019)
by Eric Gazoni, Charlie Clark, and contributors. It builds on their
original work with modern Excel 365 capabilities — round-trip
preservation of unknown parts and <extLst> extensions, dynamic-array
spill metadata, _xlfn / _xlws / _xlpm formula-prefix handling
for post-2010 functions, modern error codes, rich-text cells, pivot
builder, chartEx 2013+ chart types, first-class sparklines, shapes,
threaded comments, SHA-512 password hashing, optional agile
encryption, and a src/xlsx/ rename. Credit for the foundational
library goes to the original authors.
pip install git+https://github.com/loadfix/python-xlsx.git
Optional extras:
pip install "git+https://github.com/loadfix/python-xlsx.git#egg=python-xlsx[encryption]"
Requires Python 3.10+. Not yet published to PyPI; install from source only.
from xlsx import Workbook, load_workbook
wb = Workbook()
ws = wb.active
ws["A1"] = "It was a dark and stormy night."
ws.append([1, 2, 3])
wb.save("dark-and-stormy.xlsx")
wb = load_workbook("dark-and-stormy.xlsx")
print(wb.active["A1"].value)
# It was a dark and stormy night.The package is imported as xlsx, matching the sibling series
(docx, pptx). Code written against the fork-of-fork's
openpyexcel import name must migrate to xlsx.
For workbooks with millions of rows the eager Workbook.save() path
materialises the full sheet tree before serialising and OOMs past
~100k rows. Use Workbook.stream(path) instead — rows are emitted
straight into the zip as they arrive, peak memory stays bounded
regardless of cardinality:
from xlsx import Workbook
with Workbook.stream("huge.xlsx") as wb:
ws = wb.add_sheet("data")
ws.set_column_widths({1: 12, 2: 30})
ws.freeze_panes(top_row=2)
ws.append_row(["ID", "Name", "Revenue"]) # header
for record in iter_records(): # millions of rows
ws.append_row([record.id, record.name, record.revenue])The streaming path supports plain values (str, int, float,
bool, datetime/date, None), implicit shared-strings dedupe
(capped at ~10k entries; spill to inline strings beyond), pre-row
column widths, frozen panes, and a header marker. It does not
emit charts, drawings, conditional formatting, pivots, data
validations, or formulas — use the eager Workbook.save() path when
those features matter. Mutators called after the first append_row
raise xlsx.StreamingFrozenError.
See xlsx.writer.streaming for the full contract.
See FEATURES.md for the full public-API catalogue.
API and user-guide documentation lives under docs/ (Sphinx, furo
theme); it dates to the openpyxl 2.5 era and is being retooled for
this fork — CLAUDE.md at the repo root is the canonical orientation
document for contributors until a fresh docs site lands.
Summary of areas extended beyond the 2019-era openpyxl fork point:
- Round-trip preservation — zip-level pass-through of unknown xlsx
parts +
<extLst>raw-bytes preservation across ~100 element types. Sparklines, slicers, timelines, rich data types, threaded comments, chartEx, ribbon customisations, and custom XML parts survive load / save even when the library has no typed model for them. - Dynamic arrays — first-class
ArrayFormulawithcm="1"/t="array"spill metadata, synthesisedxl/metadata.xmlpart, and the spill operatorA1#in the tokenizer. - Modern formula prefixes — transparent
_xlfn./_xlfn._xlws./_xlpm.handling for ~150 post-2010 functions (XLOOKUP, FILTER, UNIQUE, SORT, LAMBDA, LET, ...), including LAMBDA parameter scope. - Modern error codes —
#SPILL!,#CALC!,#FIELD!,#BLOCKED!,#CONNECT!,#BUSY!,#UNKNOWN!,#EXTERNAL!. - Cells —
CellRichTextfor multi-run mixed-formatting values, read and write via the shared-strings table. - Tables —
Worksheet.add_table()reads the header row, seedsTableColumnnames, wires totals-row aggregate helpers (sum/avg/count/max/min/etc.), supports calculated-column formulas.Worksheet.add_table_from_dataframe(df, name, location, …)is the pandas-native companion: writes the DataFrame, registers a real<table>part, and accepts atotal_row_aggsmap (sum/mean/count/ max/min — pandas spelling), per-columnnumber_formats(thecurrency/pct2/raw-Excel-code DSL), andcolumn_widths="auto"via the autofit helper. Pandas is an optional dep — install withpip install python-xlsx[dataframe]. - Pivots —
Workbook.create_pivot(source, target, rows=, cols=, values=, filters=)builds the pivot cache, cache records, and pivot table definition from a source range. - Power Pivot data model —
wb.data_model.add_table(name, source=),add_relationship(from_table=, from_column=, to_table=, to_column=), andadd_measure(name=, table=, expression=, format=)author a tabular data model (DAX expressions persisted verbatim) atxl/model/data-model.xml. Round-trips cleanly through python-xlsx; Excel-authored Power Pivot payloads continue to pass through byte-for-byte. - Power Query authoring —
wb.add_query(name, source, transform_steps=, refresh_on_open=)emits anxl/connections.xmlentry plus a<DataMashup>customXml sidecar carrying the M-language source. Eight transform-step kinds (parse_csv/parse_json/rename_columns/filter/group_by/join/sort/limit); URL / file / named-range source bindings. Issue #104. - Filters and sort — rich
AutoFilterhelpers (add_top10,add_dynamic_filter,add_custom_filter,add_color_filter,add_icon_filter,add_date_group_filter) andSortStatehelpers (sort_by_value,sort_by_color,sort_by_icon). - Conditional formatting — modern data-bar options
(negativeFillColor, negativeBorderColor, axisColor, axisPosition,
gradient, direction, border) and custom icon-set icons; x14
extension round-trip.
ws.conditional_format(range, kind=...)wizard with twelve+ named kinds ("heatmap","data_bars","icon_set","top_n"/"bottom_n","above_avg"/"below_avg","duplicates"/"unique","greater_than"/"less_than"/"equal_to","text_contains","formula") and a semantic colour palette ("success"/"warning"/"danger"/"primary"/"accent"). - Charts — classic 2007 chart family preserved. Modern chartEx types for Excel 2013+ (treemap, sunburst, funnel, waterfall, box-whisker, histogram, regionMap); drawing anchors round-trip.
- Sparklines — first-class
Sparkline/SparklineGroupmodel withws.add_sparkline(location, data, type=); line / column / stacked (win-loss) types. - Drawings — first-class
Shapewith preset geometry (180+ presets), fill, line, and text;ws.add_shape()creation API; grouped-shape / connector / SmartArt / form-control / content-part preservation. - Comments — threaded comments (
ThreadedComment,PersonList, replies, resolved state) alongside the legacyCommentAPI. Ergonomic authoring:ws['B2'].comment(text, author=...)for legacy notes,ws['B3'].comment_thread().add(text, author=...).reply(...)for Excel-365 threads, plusws.comment_threads()iteration. - Protection — SHA-512 password hashing (
set_password(..., sha512=True)) matching Excel 2013+ output; legacy 16-bit hash retained for back-compat. - Encryption — load and save password-protected
.xlsxviapython-ooxml-crypto(the[encryption]extra). - Column sizing —
ws.autofit_columns()measures displayed values with Pillow font metrics. - Document properties —
docProps/app.xmlround-trips verbatim;docProps/custom.xmlcustom properties preserved. - Formula audit / lint —
wb.audit_formulas(rules=None)returns a list ofFormulaFinding(severity / sheet / cell_ref / rule_id / message). Seven shipped rules:broken-ref,circular-ref,range-extends-past-data,volatile-function,inconsistent-formula,unused-named-range,external-link-stale. Sister ofWorksheet.audit_styles()and theDocument.lint()/Page.lint()siblings in python-docx / python-vsdx. CLI hook:python -m xlsx audit-formulas <file.xlsx>. - Streaming reader —
Workbook.stream_read(path)returns a bounded-memoryStreamingWorkbookcontext manager. Sheets expose a forward-onlyrowsgenerator that yields tuples of cell values (str/int/float/bool/None); the underlyinglxml.etree.iterparseclears each<row>element after yielding so resident memory stays flat regardless of row count. Use this when you need to scan a workbook whose row count would dwarf the eager loader (~1M+ rows). Cells are read-only —set_valueraisesStreamingNotMutableError; for editing, formulas, styles, comments, merged cells, or hyperlinks, use the eagerload_workbook. - Semantic diff —
old.diff(new, level="content")returns aWorkbookDiffdescribing how two workbooks differ. Three levels:"structural"(sheet add/remove only),"content"(per-cell value- formula diffs, default),
"formatting"(adds per-cell number- format / font / fill / border / alignment / protection diffs). Renderers:.summary(counts dict),.changes(orderedChangerecords),.to_markdown()(PR-comment friendly),.to_html(), and.to_xlsx_with_changes_highlighted()(a third.xlsxwith changed cells shaded by kind).
- formula diffs, default),
- Python 3.10+ only —
openpyexcel.compatPython-2 shim removed; package renamed toxlsx, moved tosrc/xlsx/with apy.typedmarker.
Workbook.save(path, reproducible=True) produces a byte-identical
.xlsx for byte-identical inputs across machines and runs:
from xlsx import Workbook
wb = Workbook()
wb.active["A1"] = "Hello"
wb.save("out.xlsx", reproducible=True)The flag stamps every zip-member with the fixed 1980-01-01 timestamp,
emits members in sorted order, and normalises external file
attributes — the three sources of cross-machine nondeterminism a
plain zip_date_time= keyword does not cover. Use it for source-
control-friendly diffs, fixture regeneration, and content-addressable
artefact pipelines; use zip_date_time= directly when you want a
custom timestamp without sort or attribute normalisation. The
matching reproducible= keyword is also accepted by python-docx,
python-pptx, and python-vsdx so cross-format build pipelines
share the same idiom (issue #150).
The Sphinx-built site is auto-deployed to https://loadfix.github.io/python-xlsx/ on every push to master.
To build locally:
pip install sphinx furo myst-parser
python -m sphinx -b html docs docs/_build/html
open docs/_build/html/index.html
Editing real-world .xlsx workbooks (pivots, conditional formatting,
charts, threaded comments, custom XML, signatures, encryption) without
disturbing unrelated parts is a core fork goal. The local fidelity
suite under tests/roundtrip/ exercises Excel-365 fixtures; the
cross-monorepo OPC-level gate lives at
tests/round_trip/ and runs as the
round-trip-fidelity CI job.
The per-feature support matrix (what's "fully preserved" / "preserved
with caveats" / "lossy") across all four parent formats lives at
docs/round-trip-fidelity.md.
Unstable. The public API is still being shaped as the fork
modernises toward Excel-365 fidelity; expect incompatible changes
before a 1.0 line. Not yet published to PyPI. Current version:
2026.05.0 (CalVer, YYYY.MM.patch).
Issues and pull requests are tracked at
https://github.com/loadfix/python-xlsx/issues. See
CONTRIBUTING.md for workflow notes and
CLAUDE.md for architectural orientation. Run the
unit suite with .venv/bin/python -m pytest src/xlsx/ -q and the
round-trip fidelity harness with
.venv/bin/python -m pytest tests/roundtrip/ -v before opening a
PR.
MIT. See LICENCE.md. Inherited from openpyxl via
openpyexcel.
Part of a family of document-rendering libraries:
- docxjs — browser-side DOCX → HTML renderer (TypeScript)
- pptxjs — browser-side PPTX → HTML renderer (TypeScript)
- xlsxjs — browser-side XLSX → HTML renderer (TypeScript)
- python-docx — Python DOCX parser/generator
- python-pptx — Python PPTX parser/generator
- ooxml-validate — Python/.NET OOXML validator (wraps Microsoft Open XML SDK + LibreOffice)