import pandas as pd
import os
from pathlib import Path
import sys
= Path( os.getcwd() )
module_path = module_path.parent.parent.parent.__str__() + '\\Pesticide'
module_path
= module_path
cwd
= os.path.join(cwd,'data')
folder_path
0, module_path)
sys.path.insert(
= pd.read_csv(os.path.join(folder_path,'combined_df.csv') ,index_col=0 )
df2 # change data type of columns
'date_of_sampling'] = pd.to_datetime(df2['date_of_sampling']) df2[
Communicating code and data
Using the notebook format to communicate code and data
Content
- Notebooks overview
- Converting Notebooks
- Example useage of notebooks
Notebooks
Jupyter Notebooks
From TalkPython: Awesome Jupyter Libraries and Extensions
Jupyter is an amazing environment for exploring data and generating executable reports with Python. But there are many external tools, extensions, and libraries to make it so much better and make you more productive.
- A notebook consists of two parts
- markdown part where we can:
- write text, add images, links, html, LaTeX etc
- code part which runs and displays output of code
- markdown part where we can:
Some links:
Example of a notebook
An example notebook
Example of a notebook: output
import matplotlib.pyplot as plt
'date_of_sampling']) plt.plot(df2[
Example of a notebook: output 2
import altair as alt
from vega_datasets import data
= alt.UrlData(
movies
data.movies.url,format=alt.DataFormat(parse={"Release_Date":"date"})
)= ['G', 'NC-17', 'PG', 'PG-13', 'R']
ratings = ['Action', 'Adventure', 'Black Comedy', 'Comedy',
genres 'Concert/Performance', 'Documentary', 'Drama', 'Horror', 'Musical',
'Romantic Comedy', 'Thriller/Suspense', 'Western']
= alt.Chart(movies, width=200, height=200).mark_point(filled=True).transform_calculate(
base = "floor(datum.IMDB_Rating)",
Rounded_IMDB_Rating = "datum.Production_Budget > 100000000.0 ? 100 : 10",
Hundred_Million_Production = "year(datum.Release_Date)"
Release_Year
).transform_filter(> 0
alt.datum.IMDB_Rating
).transform_filter(='MPAA_Rating', oneOf=ratings)
alt.FieldOneOfPredicate(field
).encode(=alt.X('Worldwide_Gross:Q', scale=alt.Scale(domain=(100000,10**9), clamp=True)),
x='IMDB_Rating:Q',
y="Title:N"
tooltip
)
# A slider filter
= alt.binding_range(min=1969, max=2018, step=1)
year_slider = alt.selection_single(bind=year_slider, fields=['Release_Year'], name="Release Year_")
slider_selection
= base.add_selection(
filter_year
slider_selection
).transform_filter(
slider_selection="Slider Filtering")
).properties(title
# A dropdown filter
= alt.binding_select(options=genres)
genre_dropdown = alt.selection_single(fields=['Major_Genre'], bind=genre_dropdown, name="Genre")
genre_select
= base.add_selection(
filter_genres
genre_select
).transform_filter(
genre_select="Dropdown Filtering")
).properties(title
#color changing marks
= alt.binding_radio(options=ratings)
rating_radio
= alt.selection_single(fields=['MPAA_Rating'], bind=rating_radio, name="Rating")
rating_select = alt.condition(rating_select,
rating_color_condition 'MPAA_Rating:N', legend=None),
alt.Color('lightgray'))
alt.value(
= base.add_selection(
highlight_ratings
rating_select
).encode(=rating_color_condition
color="Radio Button Highlighting")
).properties(title
# Boolean selection for format changes
= alt.binding_checkbox()
input_checkbox = alt.selection_single(bind=input_checkbox, name="Big Budget Films")
checkbox_selection
= alt.condition(checkbox_selection,
size_checkbox_condition 25),
alt.SizeValue('Hundred_Million_Production:Q')
alt.Size(
)
= base.add_selection(
budget_sizing
checkbox_selection
).encode(=size_checkbox_condition
size="Checkbox Formatting")
).properties(title
| filter_genres) & (highlight_ratings | budget_sizing ) ( filter_year
Communicating when code is a large element of what is being presented
- Microsoft Word/ppt- type methods aren’t set-up well to include code
- Programming files (e.g.
.py
) aren’t set-up well to share - Videoing code with outputs is an option, but don’t translate to other formats (i.e. we may also need to do a written format of this)
- Apps (e.g. streamlit) can be good.
- But the code is hidden
- Programming notebooks (e.g.
.ipynb
) offer a good and easy to share code but with some limitations
An easier way is to convert the notebooks to html
- e.g. maybe someone doesn’t have python installed
Notebook Benefits
- Notebooks are intuitive
- You have the code then the result of the code
- Plus can add details of how code works
- And it’s linear
- Can get things up and working quickly
- Aid with communicating code
- Encourages Writing
- and writing things down aids thinking in the now and understanding what you did and why in the future
- and writing things down aids thinking in the now and understanding what you did and why in the future
- Can use shell commands e.g.
!pip install pandas
- Can use magic commands e.g.
%%time
to time a cell
With the ONS moving towards Python/R from Excel and a varied level of skills. The first of these is particularly important to aid communicating code
Converting Notebooks
What I have used to convert notebooks
- fastpages
- Previously I converted notebooks to html via fastpages but this is now deprecated and they are recommending the use of quarto.
- quarto
- So far I have found quarto really good and flexible (N.B. R works too)
- Easy to convert a notebook to multiple formats, including html, powerpoint, pdf, word doc
- BUT Quarto is not possible within ONS (as far as I can tell currently)
- nbconvert is another option I tried
- but it doesn’t seem to have the functionality of fastpages or quarto.
- Jupyter Books seems to be the best option within ONS
- Maybe not as good as quarto but it works!
Others
- I know some people use Sphinx,
- is recommended by QA
- From what I can tell sphinx on it’s own is not as easy to use as notebooks
- But there is a jupyter extension nbsphinx
- Jupyter Books uses Sphinx heavily under the hood
- nbdev
- I think is connected to quarto
- Voila
- Voilà turns Jupyter notebooks into standalone web applications.
- Looks good, bit like streamlit
- but seems to interfere with other libraries and not checked whether works in ONS
- mercury seems similar
- Anything else people use and recommend?
Quarto Outputs
We can then create different files from this .ipynb Jupyter notebook using the following code:
quarto render testPres.ipynb --to pptx
quarto render testPres.ipynb --to pdf
quarto render testPres.ipynb --to html
quarto render testPres.ipynb --to revealjs
or for Jupyter Books - jupyter-book build .\PesticideDocs\
Creating a webpage from this
Takes about 30 mins including installing chosen converter. (But can be done much quicker)
- create a Github repo for your website
- choose the converter (e.g. Jupyter Books)
- And follow their instructions
- go to settings -> Pages within the repo
- few options to do
- Optional: add your own website url to it
Link how to do this here
Examples
Example: Documenting Code
Here is my website for my research project on pesticides in UK food.
This is not the same as documentation for a package but there are parallels
This does a few things:
- Documents the analysis steps I have taken including the code and outputs
- Useful for data transparency, useability of the code if needs modifiying/adapting, and why I did XYZ
- Provides a way to present the data
- There is a streamlit app, but sometimes I like to be able to see the code
Example: Discussing Code
- GP Tables example
Example: Tool to aid learning
A big area I have been using Jupyter Notebooks for is to aid learning
- If you want to understand something it helps to write it down
- Having the code next to it is a big advantage
- And if stored on github you can access it anywhere
Example: Debugging Code
- Since starting at ONS I have been working with understanding an existing project and latterly adding code to it
- The project consists of multiple python files across several folders
- My Python was good but lots of the functions and their useage weren’t immediately obvious to me
- break-points in VS Studio is really good to step through the code and work out what happens in the code.
- I had not used before with Python (but had lots with MATLAB), and it’s really useful
- But it can be limited what you can do
- difficult to probe code if want to write more than 1 line of code
- the experience/knowledge exists as you go through it but no documentation to refer to later, e.g. function X does this when I give it Y etc
- By copying and pasting code into Jupyter cells I could see and document how they worked (e.g. changing inputs)
- This (copying and pasting) would get around code changes too (which would be an issue if modules were just imported)
- because this was all done in Jupyter notebook I can have a ipynb code file and a html file showing how the code works
- I could even save a pickle file of the variables at a particularly point to understand how the code would work from this point
Presenting in multiple formats
- Jupyter notebooks can be used on their own or as html
- But can also be used to create presentations, pdf/word documentation or even books
- This presentation was done with Quarto using the
revealjs
format- So it is a presentation format but with a html file
- Some of these file types can be difficult within ONS framework to do
- I hit a wall when trying to go beyond html and docs with Jupyter books due to dependencies
Presenting in multiple formats: video
Questions/ Comments
- Thoughts on:
- using notebooks
- documenting code
- encouraging communication of code across ONS areas and experiences
- Can we share html files? Or do we have to work within the current framework?
- Anything else?