Communicating code and data

Using the notebook format and app frameworks to communicate code and data

Thomas H. Simm

Content

  • Notebooks
    • What are they?
    • Examples
    • Pros and cons
  • Apps
    • Voila
    • Streamlit
  • Websites and HTML
    • Converting notebooks to HTML and websites
  • Presentations
    • Using notebooks for presentations
  • Tabular Data
    • Comments on Excel
    • Thoughts on code alternatives

Overview Communicating code

  • Communicating when code is a large element of the presentation

    • Microsoft Word/ppt- type methods aren’t set-up well to include code
    • Programming files (e.g. .py) aren’t set-up well to share or understand what is going on in the code
    • Videoing code with outputs is an option, but doesn’t translate to other formats (i.e. we may also need to do a written format of this)
    • Excel is a good way to present tabular data but if we are only viewing is slow and inefficient
  • Better ways?

    • Programming notebooks (e.g..ipynb) offer a good and easy to share code alongside written content
      • And can be converted to other formats easily such as html
    • Apps (e.g. streamlit) can be good
      • particularly when interactivity is required

Notebooks General

Jupyter Notebooks

From TalkPython: Awesome Jupyter Libraries and Extensions

Jupyter is an amazing environment for exploring data and generating executable reports with Python. But there are many external tools, extensions, and libraries to make it so much better and make you more productive.

  • A notebook consists of two parts
    • markdown part where we can:
      • write text, add images, links, html, LaTeX etc
    • code part which runs and displays output of code

Some links:

Example of a notebook

An example notebook

Markdown in a notebook 1

Some useful commands:

  1. # Notebooks General and ## Markdown in a notebook 1

  2. ![](ghtop_images/pest.png) looks like this

Markdown in a notebook 2

  1. And the same with a mp4 file ![](ghtop_images/revealjs.mp4)

  2. Or a youtube video ""

Markdown in a notebook 3

  1. > If we want text like this

If we want text like this

  1. Or if we want code use `a = b + c`

or:

```

a = b

a = a + c

```


a = b

a = a + c

Markdown in a notebook 4

  1. HTML works too
<img src="ghtop_images/pest.png"></img>

Code in a notebook

Example interactive format using altair:

Simpler code output

x = np.arange(0,np.pi,.01)
y = np.sin(x)
plt.plot(x,y)

Other code stuff

  • Can use shell commands e.g. !pip install pandas
  • Can use magic commands e.g. %%time to time a cell
%%time
y=0
for x in range(0,100):
    y+=x
print(f"y is {y}")
y is 4950
CPU times: total: 0 ns
Wall time: 0 ns

But Not everyone loves notebooks :(

Notebooks have their validish detractors I don’t like notebooks.- Joel Grus Youtube

Notebooks Opinion

Although notebooks have their validish detractors I don’t like notebooks.- Joel Grus Youtube I think if you approach them in the right way they are a super powerful tool.

The negatives seem to be:

  • encourage bad practice in code (a genuine problem)
  • issues around order of what cell is run (easily got around with good practice)
  • issues around lack of auto complete (I don’t see the issue, use in visual studio autocomplete is there)
  • no grammar/spelling correction
  • issues with using git and version control
    • there are ways around this though
  • anything else?

Notebook Benefits

  • Notebooks are intuitive
    • You have the code then the result of the code
    • Can add text or images
    • And it’s linear
  • Can get things up and working quickly
  • Aid with communicating code
  • Encourages Writing
  • Easy to convert code to a pipeline
  • Treat it as a notebook
    • in the same way an artist would have a sketchbook to help make final piece
  • With many companies moving towards Python/R from Excel and a varied level of skills.
    • “Aid with communicating code” is particularly important

Example Useage

Example: Documenting Code

  • Here is my website for my research project on pesticides in UK food

  • This is not the same as documentation for a package but there are parallels

This does a few things:

  • Documents the analysis steps I have taken including the code and outputs
    • Useful for data transparency, useability of the code if needs modifiying/adapting, and why I did XYZ
  • Provides a way to present the data
    • There is a streamlit app, but sometimes I like to be able to see the code

Example: Tool to aid learning

A big area I have been using Jupyter Notebooks for is to aid learning

  • If you want to understand something it helps to write it down
  • Having the code next to it is a big advantage
  • And if stored on github you can access it anywhere

Tensoflow cheat sheet

Example: Debugging Code

  • Since starting at ONS I have been working with understanding an existing project and latterly adding code to it
  • The project consists of multiple python files across several folders
    • My Python was good but lots of the functions and their useage weren’t immediately obvious to me
  • break-points in VS Studio is really good to step through the code and work out what happens in the code.
    • I had not used before with Python (but had lots with MATLAB), and it’s really useful
  • But it can be limited what you can do
    • difficult to probe code if want to write more than 1 line of code
    • the experience/knowledge exists as you go through it but no documentation to refer to later, e.g. function X does this when I give it Y etc

Example: Debugging Code 2

  • By copying and pasting code into Jupyter cells I could see and document how they worked (e.g. changing inputs)
    • This (copying and pasting) would get around code changes too (which would be an issue if modules were just imported)
    • because this was all done in Jupyter notebook I can have a ipynb code file and a html file showing how the code works
    • I could even save a pickle file of the variables at a particularly point to understand how the code would work from this point

Apps

Apps Overview

There are many packages that can be used to convert python code to an app

Using Notebooks directly

Apps without notebooks

  • PySimpleGUI
    • Simple and useful but not the best for displaying data
  • Dash
    • Looks really good, but also super complicated
  • Streamlit
    • Easy and looks good

Voila

  • Voila is relatively simple to use
  • run with something like voila .\Excel_Voila.ipynb
  • converts notebook to an app
  • can use things like ipywidgets for interactivity
  • the reason I’m interested in it’s use is streamlit doesn’t seem to give flexibility to modify table output I’d like

Streamlit Overview

Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science. In just a few minutes you can build and deploy powerful data apps. So let’s get started!

Principally used to create apps, but some of the functionality works well for code/data presentations

Streamlit Functionality: overview

Streamlit allows various functionality:

  • textbox
  • images/videos
  • charts/tables
  • menus/buttons
  • etc

Streamlit Functionality: streamlit_layout

But unlike some apps (am thinking MATLAB GUIs) you can’t create the look and functionality separately. So if you want something in a certain position it can be tricky. HTML can be used with st.markdown to give more control but it isn’t recommended to use by streamlit.

Instead, to create the layout as you would like they have the following features:

Streamlit Functionality: columns and sidebar

The most useable are the first two: columns and sidebar

Columns allows us to split the app vertically. The code is fairly simple:

Either colL, colM, colR = st.columns(3) for 3 equal columns or to split columns with different sizes:

colL, _, colR = st.columns((10, 5, 20))
with colL:
    st.write('On the left')
with colR:
    st.write('On the right twice as big as left')

st.sidebar just adds a sidebar to the app that can be hidden or shown.

Anything in the sidebar is just prefixed by st.sidebar so:

st.sidebar.write('I am in the sidebar')
st.write('I am in the main app')
st.sidebar.write('I am back in the sidebar')

Streamlit Functionality: html

It is possible to add various additional personalisations using html. - BUT it does come with security risks and so is [not recommended]](https://github.com/streamlit/streamlit/issues/152)

Allows much more control over the layout of the app that can be useful for a presentation: - Can add a background image - Can add background color to a textbox - Control over positioning of widgets - lots more

HTML is implementated using st.markdown with unsafe_allow_html=True inside the former

Streamlit Functionality: html examples

add background to a text box

text = "Code Examples"
        st.markdown(f'<center><p style=font-family:"Calibri";background-color:#FFFFFF;color:#000000;font-size:42px;border-radius:10%><b>{text}</b></p></center>', unsafe_allow_html=True)

Or to add a background image

import streamlit as st
import base64

@st.cache(allow_output_mutation=True)
def get_base64_of_bin_file(bin_file):
    with open(bin_file, 'rb') as f:
        data = f.read()
    return base64.b64encode(data).decode()

def set_png_as_page_bg(png_file):
    bin_str = get_base64_of_bin_file(png_file) 
    page_bg_img = '''
    <style>
    .stApp {
    background-image: url("data:image/png;base64,%s");
    background-size: contain;
    background-repeat: no-repeat;
    background-attachment: scroll; # doesn't work
    }
    </style>
    ''' % bin_str
    st.markdown(page_bg_img, unsafe_allow_html=True)
    return

Streamlit Functionality: echo

Sometimes you want your Streamlit app to contain both your usual Streamlit graphic elements and the code that generated those elements. That’s where st.echo() comes in

Easier to display this by an example:

In the example above the right of the image is given below (st.columns is used, where the input for the function is found from the left column).

  • st.echo is used with the with statement.
  • everything within the with is printed to the screen and executed
with st.echo():
    # Everything inside this block will be both printed to the screen
    # and executed.

    def do_pd_replace(text, text_search, text_sub):
        col_name = "Start string"
        df = pd.DataFrame(data=[text], columns=[col_name])

        df["Final String"] = df[col_name].replace(
            text_search, text_sub, regex=True
        )

        st.dataframe(df)
        st.write(f"text_search = '{text_search}' and text_sub = '{text_sub}'")
        st.write(f"Input string = '{text}'")
        st.write(f"Output string = '{df['Final String'].values[0]}'")

    do_pd_replace(text, text_search, text_sub)

Streamlit Functionality: pages

By simply creating a folder called pages and putting other streamlit .py files in the folder they can then be accessed in the sidebar.

  • A main file needs to be outside the pages folder
  • The .py files in pages behave as if they were outside the folder (i.e. when loading files/functions)

Example Streamlit Presentation

Notebooks to HTML

What I have used to convert notebooks to html

  • fastpages
    • I have used fastpages, but this is now deprecated and they are recommending the use of quarto
  • quarto
    • So far I have found quarto really good and flexible (N.B. R works too)
    • Easy to convert a notebook to multiple formats, including html, powerpoint, pdf, word doc
    • BUT Quarto is not possible if installing from non pip sources is an issue (as far as I can tell currently)
  • nbconvert is another option I tried
    • but it doesn’t seem to have the functionality of fastpages or quarto
  • Jupyter Books seems to be the best option within companies with installation issues
    • Maybe not as good as quarto but it works!

Others

  • I know some people use Sphinx,
  • nbdev
    • I think is connected to quarto
  • Voila
    • Voilà turns Jupyter notebooks into standalone web applications.
    • Looks good, bit like streamlit
    • but seems to interfere with other libraries
    • mercury seems similar

Creating html (& other formats)

Quarto

Installation is via a package i.e. .msi for Windows or .pkg for Mac. Which can cause issues.

Works with both ipynb and qmd files, which are both a mixture of markdown and executable code.

The only thing that needs to be done with the notebook is add a YAML block at the start of the notebook, like the following (raq not markdown was used):

---
title: "Communicating code: Website"
subtitle: "Using the notebook format for a website"
author: "Thomas H. Simm"
format:
  html:
    toc: true
title-slide-attributes:
  data-background-size: contain
  data-background-opacity: "0.5"
jupyter: python3
---

We can create different files from this .ipynb Jupyter notebook using the following code:

  • quarto render testPres.ipynb --to pptx
  • quarto render testPres.ipynb --to pdf
  • quarto render testPres.ipynb --to html
  • quarto render testPres.ipynb --to revealjs

Quarto

Further, formatting for projects (i.e. for website) can be done within the configuration file _quarto.yml

project:
  type: website
  output-dir: _site

website:
  title: "ThomasHSimm"
  favicon: /posts/Picture3.png
  body-header: <img src="/posts/header2.png" height=200>

  navbar:
    right:
      - about.qmd
      - icon: github
        href: https://github.com/ThomasHSimm
      - icon: mortarboard-fill
        href: https://scholar.google.com/citations?hl=en&user=HdPDn1sAAAAJ
format:
  html:
    theme: 
      light: flatly
      dark: darkly
    css: styles.css

Jupyter Books

We can create different files from this .ipynb Jupyter notebook using the following code:

  • jupyter-book build .\PesticideDocs\
  • jupyter-book build <path-to-book>
  • jupyter-book build <path-to-book> --builder pdfhtml
  • jupyter-book build <path-to-book> --builder singlehtml

The only difference in notebook is that it needs to have One header in a markdown cell for the table of contents, e.g. 

# Title of page

Configuration file

A seperate files _config.yml is used to define how the html (or other) files will look

# Book settings
# Learn more at https://jupyterbook.org/customize/config.html

title: Defra Pesticide Testing, Data Analysis
author: Thomas Simm
logo: ONS-logo.png
exclude_patterns: [_build, Thumbs.db, .DS_Store, "**.ipynb_checkpoints"]


# Force re-execution of notebooks on each build.
# See https://jupyterbook.org/content/execute.html
execute:
  execute_notebooks: force

# Define the name of the latex output file for PDF builds
latex:
  latex_documents:
    targetname: book.tex

# Add a bibtex file so that we can create citations
bibtex_bibfiles:
  - references.bib

# Information about where the book exists on the web
repository:
  url: https://github.com/ThomasHSimm/Pesticide  # Online location of your book
  path_to_book: docs  # Optional path to your book, relative to the repository root
  branch: master  # Which branch of the repository should be used when creating links (optional)

# Add GitHub buttons to your book
# See https://jupyterbook.org/customize/config.html#add-a-link-to-your-repository
# HTML-specific settings
html:
  favicon                   : "_images/favicon.jpg"  # A path to a favicon image
  use_edit_page_button      : false  # Whether to add an "edit this page" button to pages. If `true`, repository information in repository: must be filled in
  use_repository_button     : false  # Whether to add a link to your repository button
  use_issues_button         : false  # Whether to add an "open an issue" button
  use_multitoc_numbering    : true   # Continuous numbering across parts/chapters
  extra_navbar              : Powered by <a href="https://jupyterbook.org">Jupyter Book</a>
                              <br>Home website <a href="https://thomashsimm.com/">thomashsimm.com</a> # Will be displayed underneath the left navbar.
  extra_footer              : ""  # Will be displayed underneath the footer.
  google_analytics_id       : ""  # A GA id that can be used to track book views.
  home_page_in_navbar       : true  # Whether to include your home page in the left Navigation Bar
  baseurl                   : ""  # The base URL where your book will be hosted. Used for creating image previews and social links. e.g.: https://mypage.com/mybook/
  comments:
    hypothesis              : false
    utterances              : false
  announcement              : "" # A banner announcement at the top of the site.

Table of content

And in addition to the config file a table of contents file is required _toc.yml:

# Table of contents
# Learn more at https://jupyterbook.org/customize/toc.html

format: jb-book
root: intro
chapters:
- file: Pesticide_Plots
- file: References
- file: UK_areas
- file: using_jupyter_books

Creating a webpage from this

Takes about 30 mins including installing the chosen converter. (But can be done much quicker)

  • create a Github repo for your website
  • choose the converter (e.g. Jupyter Books)
    • And follow their instructions
  • go to settings -> Pages within the repo
    • few options to do
  • Optional: add your own website url to it

Link how to do this here

In Quarto a command from your PC in the repo, publishes the website:

quarto publish quarto-pub

Or equivalently with Jupyter Books:

ghp-import -n -p -f _build/html

Creating directly from the repo

If we instead want to convert notebook files directly from a repo to create a website then this can be done with Netlify.

This is useful if using Gitlab (i.e. not Github) or don’t want all the extra html files cluttering the repo.

Steps:

https://jupyterbook.org/en/stable/publish/netlify.html

  • Sign up and connect Github/Gitlab
  • Add a requirements.txt file and also toc.yml to directory
  • On netlify -> Add new site -> import from an existing repo
  • Insert something like below
    • N.B. the command:
    • pip install -r requirements.txt && jupyter-book build .
    • and folder location

On netlify

Example:

Presentations

Quarto Presentations

Quarto supports a variety of formats for creating presentations, including:

  • revealjs — reveal.js (HTML)

  • pptx — PowerPoint (MS Office)

  • beamer — Beamer (LaTeX/PDF)

I’ll consider the first two

Quarto PowerPoint overview

The steps to make a PowerPoint presentation from a notebook:

  1. Create the inbuilt template.pptx file
  2. Adjust it to match your own template
  3. At the top of the notebook insert format for pptx including the template file
  4. Choose how you will define a new page
  5. You will probably need to manually check the slides and adjust as required
    • especially for interactive content and code

Creating the template

(Office info correct for Office 365 Feb 2023, Version 2301 Build 16.0.16026.20002)

If your workplace has a custom template or you have one you always use, you can incorporate this into quarto.

However, quarto is quite specific on the form this template takes, and requires the following elements

  • Title Slide
  • Title and Content
  • Section Header
  • Two Content
  • Comparison
  • Content with Caption
  • Blank

Creating a ppt template

By selecting Layout from the Home tab in powerpoint the different layouts can be seen

They can then be modified by going to View tab - Slide Master.

Creating a ppt template 2

If using your own template you will need to match the names of the slides given above. These can be found by hovering over the slides on the left or right clicking on one and selecting “Rename Layout”

Creating a ppt template 3

Alternatively, create a custom template using quarto and then modify this. The following command creates the template:

quarto pandoc -o template.pptx --print-default-data-file reference.pptx

Then go to View tab - Slide Master and modify each slide layout.

Note if you are trying to match a template, some tips: - go to Design -> Slide Size and match this to your template - when View tab - Slide Master is selected go to first tab (see above it will be left indented) on one you are copying from and select all on this then paste to the new template - these will be background images and other things that want to be passed to all slides - Check other slides for images and font-styles etc to match to the new template

Load the template

To load the template the first cell in the notebook needs to be modified as follows to reference the template.pptx file.

format:
  pptx:
    reference-doc: template.pptx
    slide-level: 2

In addition, we can also specify here the rule by which a new slide is defined. If slide-level: 2 is used a new slide is defined by “##’ and a new section header by ‘#’. So if we used ‘###’ this would be a heading within the slide.

If slide-level: 1 is used a new slide is defined by “#’ and ‘##’ this would be a heading within the slide (this is normally the default).

Check the slides

I have found creation of slides to powerpoint more prone to strange results than if .doc/.pdf/.html are used.

So check the slides, see if interactive content or code has been included (probably not) and if the slide content goes outside the slide.

In the example above - There is overlap of text on a slide - Strange ouput of a slide - Code output not displayed

Quarto HTML overview

With quarto two different html formats can be produced by using either html or revealjs.

Whereas, html produces the standard html format revealjs produces an interactive presentation format. https://quarto.org/docs/presentations/revealjs/

revealjs does pretty much the same as a powerpoint file but is more robust - interactive content is included - less issues with getting format to fit within the slide

But - Can’t use the ppt template - And maybe(?) there are issues with sharing this format? - Interactive elements not as well implemeneted as within pure html

Adding style to revealjs

A simple way to add template like details to a revealjs file is to add a style.css sheet.

In the example below, the style sheet adds logo.png to the bottom right of each sheet

The file style.css looks like this:

.reveal .slide-logo {
  display: block;
  position: fixed;
  top: unset !important;
  left: unset !important;
  bottom: 50px;
  right: 12px;
  height: 100px !important;
  width: 100x !important;
  max-width: unset !important;
  max-height: unset !important;
}

Adding style to revealjs

And the revealjs part at the top of the jupyter notebook looks like this

revealjs:
    slide-number: true
    height: 1080
    width: 1920
    logo: logo.png
    css: style.css

So this would then look like the following, with the logo (logo.png) in the bottom right, and size and positioning given by the css file

What the revealjs file looks like

Tabular Data

  • How efficient is Excel when we are just viewing the data?
  • What other with code options to Excel are there?

Excel files

ChatGPT: What are the problems of Excel and xlsx files?

  1. Limited scalability: Excel has a limit on the number of rows and columns it can handle effectively.

  2. Inaccurate data: The manual nature of data entry in Excel can lead to errors and inaccuracies.

  3. Fragility: Complex Excel spreadsheets can break easily if any formula or cell reference is altered.

  4. Lack of security: Excel files are vulnerable to unauthorized access and malicious attacks.

  5. Slow performance: Large and complex Excel files can become slow to open and use, leading to decreased productivity.

  6. Compatibility issues: XLSX files may not be compatible with older versions of Excel, or with other software applications.

  7. Limited collaboration: Sharing Excel files and making changes to them can be difficult, especially when multiple people are involved.

For me it is the Slow performance alongside: (a) we are doing the data manipulation outside Excel anyway and (b) having to have another application open

What aspect of tables I am considering

Loading data

ods with Excel

About 10 s to open 3 ods files with Excel

.ods converted to html in firefox browser

About 6 s to open 3 converted ods files in a browser

.ods converted to html firefox browser no new tabs

Almost instant when open converted ods files in same browser

Convert xlsx to html?

  • Opening xlsx files in Excel is slow
  • Converting to html if we don’t want to edit could be an option
  • If we are moving to Python/R aren’t non-Excel options worth considering??

Converting xlsx files to html

  • Seems the most obvious course
  • But it doesn’t seem that easy with code
    • but easy within Excel

pandas

  • Since (in Python) we are mainly working with pandas. Let’s consider how pandas outputs can be modified.
  • pandas options

Some code functionality

# precision of all columns
pd.set_option("display.precision", 2)
# Or map as a string
df2['amount_pc_str'] = df2['amount_pc'].map(lambda x: '%.3f' % x)
# some other options
pd.set_option('max_colwidth', 20)
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', 0)

pandas basic

sample_id date_of_sampling description country_of_origin retail_outlet address brand_name packer_/_manufacturer_/_importer product address_postcode packer_postcode address_area packer_area chem_name amount_detected mrl amount_pc
0 1958/2016 2016-08-08 Bramley Apples UK Asda Creechbarrow Road, Taunton TA1 2AN Asda Asda Stores Ltd Leeds, UK LS11 5AD Apple TA1 2AN LS11 5AD Somerset West Yorkshire boscalid 0.03 2.0 0.015
1 1958/2016 2016-08-08 Bramley Apples UK Asda Creechbarrow Road, Taunton TA1 2AN Asda Asda Stores Ltd Leeds, UK LS11 5AD Apple TA1 2AN LS11 5AD Somerset West Yorkshire pyraclostrobin 0.01 0.5 0.020
2 0230/2016 2016-08-08 Bramley Apples UK Co-op Northgate, Louth LN11 0LT Co-op Co-operative Group Ltd Manchester M60 0AG Apple LN11 0LT M60 0AG Lincolnshire Greater Manchester boscalid 0.05 2.0 0.025
3 0230/2016 2016-08-08 Bramley Apples UK Co-op Northgate, Louth LN11 0LT Co-op Co-operative Group Ltd Manchester M60 0AG Apple LN11 0LT M60 0AG Lincolnshire Greater Manchester flonicamid (sum) 0.02 0.2 0.100
4 0230/2016 2016-08-08 Bramley Apples UK Co-op Northgate, Louth LN11 0LT Co-op Co-operative Group Ltd Manchester M60 0AG Apple LN11 0LT M60 0AG Lincolnshire Greater Manchester pyraclostrobin 0.03 0.5 0.060
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
35155 2858/2020 Organic 2020-10-20 Organic Sweet Potatoes Spain Tesco 300 Beverley Way, New Malden KT3 4PJ Tesco Tesco Stores Ltd Welwyn Garden City AL7 1GA Sweet_Potatoes_Q4_(BNA) KT3 4PJ AL7 1GA Greater London Hertfordshire 0 0.00 0.0 0.000
35156 0562/2020 Organic 2020-10-05 Organic Duchy Sweet Potatoes Egypt Waitrose Mill Lane, Swindon SN1 7BX Waitrose Waitrose Ltd Doncastle Road, Bracknell, Berksh... Sweet_Potatoes_Q4_(BNA) SN1 7BX RG12 8YA Wiltshire Berkshire 0 0.00 0.0 0.000
35157 0563/2020 2020-10-05 Sweet Potatoes USA Waitrose Mill Lane, Swindon SN1 7BX Waitrose Waitrose Ltd Doncastle Road, Bracknell, Berksh... Sweet_Potatoes_Q4_(BNA) SN1 7BX RG12 8YA Wiltshire Berkshire 0 0.00 0.0 0.000
35158 2601/2020 2020-10-14 Sweet Potatoes USA Waitrose Ossington Way, Newark NG24 1FF Waitrose Waitrose Ltd Doncastle Road, Bracknell, Berksh... Sweet_Potatoes_Q4_(BNA) NG24 1FF RG12 8YA Nottinghamshire Berkshire 0 0.00 0.0 0.000
35159 2601/2020 2020-10-14 Sweet Potatoes USA Waitrose Ossington Way, Newark NG24 1FF Waitrose Waitrose Ltd Doncastle Road, Bracknell, Berksh... Sweet_Potatoes_Q4_(BNA) NG24 1FF RG12 8YA Nottinghamshire Berkshire 0 0.00 0.0 0.000

35160 rows × 17 columns

pandas overview

  • Using pandas we can control various outputs
  • But these still need a format to display within
  • And display functionality is not easy

Or convert to a html file

df2.iloc[:500].to_html('df2_500.html')

But using a style sheet as shown in stack overflow by Parfait

ipydatagrid

https://github.com/bloomberg/ipydatagrid

itables code

from itables import init_notebook_mode

import itables
init_notebook_mode(all_interactive=True)

itables.show(df2)
sample_id date_of_sampling description country_of_origin retail_outlet address brand_name packer_/_manufacturer_/_importer product address_postcode packer_postcode address_area packer_area chem_name amount_detected mrl amount_pc
Loading... (need help?)

Dash

https://dash.plotly.com/datatable

Downloaded 800,000 times per month, Dash is the original low-code framework for rapidly building data apps in Python, R, Julia, and F# (experimental).

https://medium.com/plotly/introducing-jupyterdash-811f1f57c02e

OSError: Address 'http://127.0.0.1:8050' already in use.
    Try passing a different port to run_server.

Streamlit

A faster way to build and share data apps

  • Dash can be run within a notebook but is principally an app.
  • Streamlit is a similar app.
  • But much easier to code.
import pandas as pd
import streamlit as st
all_dfs = pd.read_csv("./data/combined_df.csv")
st.dataframe(all_dfs.head())
# Or
st.tables(all_dfs)

And more

DataTables

DataTables is a plug-in for the jQuery Javascript library. It is a highly flexible tool, built upon the foundations of progressive enhancement, that adds all of these advanced features to any HTML table.

Jupyter widgets

If you are looking for Jupyter widgets, have a look at (taken from https://mwouts.github.io/itables/references.html) - QGrid by Quantopian - IPyaggrid by Louis Raison and Olivier Borderies - IPySheet by QuantStack.

More details on itables

From my brief review I found itables the best package

  • It works
  • It gives lots of control of table output to be consistent with good-practice
    • Column width
    • Number formatting
    • Column alignment
  • Alongside
    • Search
    • Column ordering
    • How many rows are shown
    • Scrolling options
    • etc

itables code1

def _indi_columnDefs(cols, format,col_width):
    if 'num_format' in format:
        if format['num_format'] == "#,##0":
            format_str = "',', '.', 0, ''"
        elif format['num_format'] ==  '0.0':
            format_str = "',', '.', 1, ''"
        else: 
            format_str = "',', '.', 3, ''"
    else:
        format_str = "',', '.', 3, ''"
    columnDefs= {
            "targets": cols,
            "className":f"dt-{format['align']}",
            "render": JavascriptCode(f"$.fn.dataTable.render.number({format_str})"),
            "width": f"{col_width}px",
        }
    return columnDefs

itables code2

style = {
    "general": {
        "table_output" :{
            "pct_cols" : {
                "regex_defined": ["%" , "[Pp]ercentage"],
                "format": {"num_format": "0.0", "align": "right"}
            },
            "total_cols" : {
                    "regex_defined": ["[Tt]otal" ],
                    "format": { "num_format": "0.0", "align": "center"}
            },
            "code_cols" : {
                "regex_defined": ["[Cc]ode"],
                "format": { "align": "left"}
            },
            "_else" : {
                "regex_defined": [],
                "format": {"num_format": "#,##0", "align": "right"}
            }
        
        },
        "col_width": 130.0
    }
}

itables code3


def _create_style_col(style: dict, df: pd.DataFrame):
    style_col = {}
    colsAll=[]
    for keys in style['general']['table_output'].keys():
        if keys !='_else':
            # print(">>",keys)
            cols=[]
            for i,column in enumerate(df.columns):
                # print(column)
                if [True for reg in style['general']['table_output'][keys]['regex_defined'] if re.search(reg,column)]:
                    # print(i,column)
                    cols.append(i)
            # print(cols)
            colsAll = colsAll + cols
            style_col[keys]=cols

    style_col['_else'] = [ i for i,column in enumerate(df.columns) if i not in colsAll]

    return style_col

itables code4

def create_columnDefs(style: dict, df: pd.DataFrame):
    
    style_col = _create_style_col(style, df)
    columnDefs = []
    for keys in style['general']['table_output'].keys():
        columnDefs.append( _indi_columnDefs(style_col[keys],
                                            style['general']['table_output'][keys]['format'],
                                            style['general']['col_width']) )
    return columnDefs

itables code5


columnDefs= create_columnDefs(style, data)
show(
    data,
    # style="table-layout:auto;width:80%;float:left",
    classes="display",
    
    # specify how many rows
    lengthMenu = [25,100,-1],
    # or to scroll through data
    scrollX=True,
    # scrollY="800px", 
    scrollCollapse=True, 
    # paging=False,

    style=f"width:{style['general']['col_width']*10}px",
    autoWidth=False,

    # add footer
    # footer=True,
    
    columnDefs=columnDefs,
    tags =f'<caption style="caption-side: Bottom">File: {files[0]} and Tab: {ansa.value}</caption>'
)

Questions?