Architecture¶
nbprint is a collection of pydantic models to take or construct a Jupyter notebook from
a set of standard parts, execute it with nbconvert, and convert it to html or pdf
with a standard template optionally running pagedjs to provide a print-oriented layout.
It can be run off an existing notebook, or using the provided YAML-based configuration framework.
graph TB
nb("notebook<br>(.ipynb)")
nbc{nbconvert}
nbct[/nbprint <br> template/]
pjs[/paged.js <br> layout engine/]
o@{ shape: doc, label: "output (html,pdf,etc)" }
nb e2@--->nbc
e2@{animate: true}
nbct --> nbc
pjs --- nbct
nbc e3@-->o
e3@{animate: true}
Components¶
nbprint provides a core Configuration object with parameters for controlling:
Parameters: input parameters (like papermill)Outputs: output assets, generally usingnbconvertto create anhtmlorpdfdocumentPage: print-media specific page elements, like header/footer, page numbers, etcContext: a shared object instantiated in our notebook and passed to every content cell. This allows us to represent notebook “state” as a typedpydanticmodel.Content: a structured object representing the actual cells in our notebook
Configuration¶
graph LR
subgraph Configuration
pfile@{ shape: doc, label: "Parameters file<br>(json,jsonl,CLI)" }
paramyaml>yaml]
param["Parameters"]
configyaml> yaml]
config["Configuration"]
ctxyaml>yaml]
ctx["Context"]
pageyaml>yaml]
page["Page"]
cntyaml>yaml]
cnt["Content"]
nb@{ shape: doc, label: "Existing Notebook<br>(.ipynb)" }
outyaml>yaml]
out["Outputs"]
end
subgraph Notebook
pcell(Parameters Cell)
configcell(Configuration Cell)
pagecell(Page Cell)
ctxcell(Context Cell)
contentcell("Content Cell/s")
outputcell(Outputs Cell)
end
paramyaml eparamyamlparam@---> param
eparamyamlparam@{animate: true}
pfile epfileparam@--->param
epfileparam@{animate: true}
param --> config
ctxyaml ectxyamlctx@---> ctx
ectxyamlctx@{animate: true}
ctx --> config
pageyaml epageyamlpage@---> page
epageyamlpage@{animate: true}
page --> config
nb enbcnt@--->cnt
enbcnt@{animate: true}
cntyaml ecntyamlcnt@---> cnt
ecntyamlcnt@{animate: true}
cnt --> config
outyaml eoutyamlout@---> out
eoutyamlout@{animate: true}
out --> config
configyaml econfigyamlconfig@---> config
econfigyamlconfig@{animate: true}
pcell---configcell
configcell---pagecell
pagecell---ctxcell
ctxcell---contentcell
contentcell --- outputcell
subgraph Output
o@{ shape: doc, label: "output (html,pdf,etc)" }
end
post(Post Processing)
Configuration --> Notebook
Notebook --> Output
Output eOutputPosProcessing@---> post
eOutputPosProcessing@{animate: true}
Parameters¶
Parameters are the first cell of a notebook, and can be passed in during execution to allow for parameterized notebooks.
We provide the following builtin versions:
PapermillParameters¶
hydraconfig:nbprint/parameters/papermill
This is a basic object that takes any basic json-serializeable type and provides it in assignment as the first cell.
As an example, the following YAML:
# @package nbprint.parameters
_target_: nbprint.PapermillParameters
a: abc
b: 1.2
c: true
Would result in the following cell:
a = "abc"
b = 1.2
c = True
Page¶
Context¶
Context is used to wrap variables local to the notebook. The best documentation is a simple example in YAML form:
---
_target_: nbprint.Configuration
context:
_target_: nbprint.example.ExampleContext
content:
- _target_: nbprint.ContentCode
content: |
nbprint_ctx.string = string
- _target_: nbprint.ContentCode
content: |
print(nbprint_ctx.string)
This will create two ContentCode instances, where one sets a value string on the ExampleContext instance and the other reads it.
You can of course rely on notebook-global variables, but relying on typed contexts makes it easier to build modular reports.
Content¶
Content is the basic form of displayable content.
It can be used to wrap any generic functionality or Markdown content.
It can also be convenient to reuse display configuration.
Content has a few key attributes:
Content.content: string text content, or alist[Content]of subcontent for layout elementsContent.style: AStyleelement based on CSS for styling this contentcss: Generic string content to be injected into a<style>tag scoped to this cellesm: Generic string content to be injected into a<script>tag scoped to this cell. It is expected to contain a functionrender(cell_nbprint_metadata_as_json, cell_dom_element).
The following items are provided:
ContentCode¶
Content that is executed as a code cell.
ContentMarkdown¶
Content that is executed as a Markdown cell.
ContentImage¶
ContentTableOfContents¶
Layout Elements¶
ContentLayoutContentInlineLayoutContentFlexColumnLayoutContentFlexRowLayoutContentPageBreak
Library Configuration Elements¶
LoggingConfigPandasDisplayConfigurationPlotlyRendererConfigurationSeabornDisplayConfiguration
Outputs¶
nbprint can produce a variety of outputs based on nbconvert.
It can also postprocess these outputs based on content, to e.g. email a report if a certain cell returns True, as a simple example.
The following defaults are provided:
Outputs¶
This is the base class for all outputs. It has a few key attributes:
.root: Base path where output artifacts will generate.naming: naming convention to use for output artifacts. This is particularly useful when producing many artifacts. It is templatized via Jinja2 with the following arguments:name: name of the notebookdate: current date as ISO formatdatetime: current datetime as ISO formatuuid: a generated UUIDsha: a hash of theConfigurationobjectany parameters set in the
Configuration.parametersobject
.hook: a python callable path to be invoked after notebook generation.postprocess: a python callable path to be invoked at the very end of theConfigurationrun/s
In particular, the hooks can be used to get behavior like: only send a report via email if XYZ condition is (not) satisfied.
NBConvertOutputs¶
hydraconfig:nbprint/outputs/default
This Outputs runs nbconvert to produce an output document.
It supports the following configuration options:
target:nbconverttarget, inhtml,pdf, ornotebookexecute: whether or not to reexecute the notebook, defaults toTruetimeout: execution timeout, defaults to600stemplate:nbconverttemplate to use, defaults to"nbprint"
Additionally, there are two extra hooks that can be set:
execute_hook: Called afternbconvertexecution of the notebooknbconvert_hook: Called afternbconvertconversion of the notebook
PDFOutputs¶
hydraconfig:nbprint/outputs/pdf
Same as NBConvertOutputs, but with target=pdf.
NotebookOutputs¶
hydraconfig:nbprint/outputs/notebook
Same as NBConvertOutputs, but with target=notebook.
HTMLOutputs¶
hydraconfig:nbprint/outputs/html
Same as NBConvertOutputs, but with target=html.
WebHTMLOutputs¶
hydraconfig:nbprint/outputs/webhtml
Same as NBConvertOutputs, but with target=webhtml.
NBConvertShortCircuitOutputs¶
A specialized NBConvertOutputs that stops output processing if a cell tagged nbprint:output:stop returns True
execute_hook: A Python import path to a function to evaluate, seenbprint.config.outputs.nbconvert.short_circuit_hookas an example
EmailOutputs¶
hydraconfig:nbprint/outputs/email
Inherits from NBConvertOutputs and attaches the output to an email using a prebuilt postprocess hook.
body: Content of the email, defaults to the output namesubject: Content of the subject, defaults to the output nameto: Email recipient/sfrom_: Email sendercc: CCbcc: BCCsmtp: SMTP configurationhost: SMTP server hostport: SMTP server portuser: SMTP server userpassword: SMTP server passwordtls: Enable TLSssl: Enable SSLtimeout: Timeout for SMTP connection
Configuration¶
hydra allows us the ability to mix and match the various components defined in YAML, or even build our own.
Its easy for us to mix-and-match configuration for content, outputs, layout, and more.
It also lets us tweak existing pydantic-defined options from the command line.
Let’s look at some examples of how powerful this can be:
In this command, we run a default nbprint.Configuration but use content from an existing notebook.
# Create HTML from notebook as-is
nbprint examples/basic.ipynb
In this command, we tweak the nbprint.Configuration object’s .name attribute to be "test", and we tweak the nbprint.Configuration object’s outputs subobject’s .target attribute to be "pdf".
nbprint examples/basic.ipynb +nbprint.name=test ++nbprint.outputs.target=pdf
In this command, we tweak the nbprint.Configuration object’s parameters subobject to add a new key "a" with value "test".
This will inject the following first cell into our notebook:
# First cell is papermill-style parameters
nbprint examples/parameters.ipynb +nbprint.parameters.a=test
a = "test"
This allows us to integrate nicely with papermill notebooks, and build parameterized reports to produce different outputs from the same skeleton notebook.
In this command, we take an existing notebook content and inject some content as frontmatter.
# Overlay a config group, e.g. title and table of contents
nbprint examples/basic.ipynb nbprint/content/frontmatter=nbprint/title_toc
In particular, we grab the code defined in nbprint/config/hydra/content/frontmatter/nbprint/title_toc.yaml:
# @package nbprint.content.frontmatter
- _target_: nbprint.ContentMarkdown
content: |
# ${nbprint.name}
css: ":scope { text-align: center; }"
- _target_: nbprint.ContentPageBreak
- _target_: nbprint.ContentTableOfContents
- _target_: nbprint.ContentPageBreak
Similar to the previous command, this command swaps nbprint.Configuration object’s outputs subobject to be the
object defined in nbprint/config/hydra/outputs/nbprint/pdf.yaml:
# Create PDF via WebPDF by using hydra to swap out outputs type
nbprint examples/basic.ipynb nbprint/outputs=nbprint/pdf
That content just defines a different nbprint.Outputs object in YAML:
# @package nbprint.outputs
_target_: nbprint.PDFOutputs
target: webpdf
Finally, we could accomplish the same thing by tweaking the default Outputs object’s .target attribute to be webpdf.
# Create PDF via WebPDF same as above by using hydra to tweak the default output target
nbprint examples/basic.ipynb ++nbprint.outputs.target=webpdf
By providing the ability to do deep dependency injection in existing objects and/or swap objects wholesale,
we can customize, tweak, and extend nbprint as much as we need.