目前看来
python-docx
v0.8并不完全支持编号。你需要做一些黑客的工作。
首先,对于演示来说,为了迭代文档段落,你需要编写你自己的迭代器。
这里有一些功能。
import docx.document
import docx.oxml.table
import docx.oxml.text.paragraph
import docx.table
import docx.text.paragraph
def iter_paragraphs(parent, recursive=True):
Yield each paragraph and table child within *parent*, in document order.
Each returned value is an instance of Paragraph. *parent*
would most commonly be a reference to a main Document object, but
also works for a _Cell object, which itself can contain paragraphs and tables.
if isinstance(parent, docx.document.Document):
parent_elm = parent.element.body
elif isinstance(parent, docx.table._Cell):
parent_elm = parent._tc
else:
raise TypeError(repr(type(parent)))
for child in parent_elm.iterchildren():
if isinstance(child, docx.oxml.text.paragraph.CT_P):
yield docx.text.paragraph.Paragraph(child, parent)
elif isinstance(child, docx.oxml.table.CT_Tbl):
if recursive:
table = docx.table.Table(child, parent)
for row in table.rows:
for cell in row.cells:
for child_paragraph in iter_paragraphs(cell):
yield child_paragraph
你可以用它来查找所有的文档段落,包括表格单元格中的段落。
import docx
document = docx.Document("sample.docx")
for paragraph in iter_paragraphs(document):
print(paragraph.text)
要访问编号属性,你需要在 "受保护 "成员中搜索paragraph._p.pPr.numPr
,这是一个docx.oxml.numbering.CT_NumPr
对象。
for paragraph in iter_paragraphs(document):
num_pr = paragraph._p.pPr.numPr
if num_pr is not None:
print(num_pr) # type: docx.oxml.numbering.CT_NumPr
注意,这个对象是从numbering.xml
文件(在docx里面)中提取的,如果它存在的话。
To access it, you need to read your docx file like a package. 比如说。
import docx.package
import docx.parts.document
import docx.parts.numbering
package = docx.package.Package.open("sample.docx")
main_document_part = package.main_document_part
assert isinstance(main_document_part, docx.parts.document.DocumentPart)
numbering_part = main_document_part.numbering_part
assert isinstance(numbering_part, docx.parts.numbering.NumberingPart)
ct_numbering = numbering_part._element
print(ct_numbering) # CT_Numbering
for num in ct_numbering.num_lst:
print(num) # CT_Num
print(num.abstractNumId) # CT_DecimalNumber
Mor信息可在Office Open XMl文件。