Using Jupyter Notebooks in Atom

Be default Jupyter will generate a random token each time you run the server. You can change this.

  1. Generate a config file for Jupyter

--> jupyter notebook --generate-config

--> jupyter notebook --generate-config
Writing default config to: /Users/stephen/.jupyter/jupyter_notebook_config.py

Edit jupyter_notebook_config.py and find the line that says #c.NotebookApp.token = ''. Change it to say c.NotebookApp.token = 'my_secret_token', substituting your choice of token string. (If you skip this step, the token will change every time the notebook server restarts).

## Token used for authenticating first-time connections to the server.
#
#  When no password is enabled, the default is to generate a new, random token.
#
#  Setting to an empty string disables authentication altogether, which is NOT
#  RECOMMENDED.
#c.NotebookApp.token = '<generated>'
c.NotebookApp.token = 'my_secret_token'

Insert into Atom setting --> hydrogen

[{
 "name": "Remote notebook",
 "options": {
   "baseUrl": "http://localhost:8888",
   "token": "frodo_lives"
 }
}]

Learning Stats

I have learnt the hard way that for me to learn something new, I must practice what I am learning.

I want to learn statistics and there is a great course on Stats taught using the R language.

Now I much prefer Python and Pandas to R, there arn't that many good course teaching stats using Python. From a pedagogical viewpoint I learn best when I make detailed notes about what I learn each week. When doing an online course you can't publish your notes on your blog b/c it contains the answers, so other students could cheat. Solution: Publish the answer in Python on my blog That way I get a good overview of the strengths and weakness of each language.

Master Statistics with R

Statistical mastery of data analysis including inference, modeling, and Bayesian approaches.

In this Specialization, you will learn to analyze and visualize data in R and created reproducible data analysis reports, demonstrate a conceptual understanding of the unified nature of statistical inference, perform frequentist and Bayesian statistical inference and modeling to understand natural phenomena and make data-based decisions, communicate statistical results correctly, effectively, and in context without relying on statistical jargon, critique data-based claims and evaluated data-based decisions, and wrangle and visualize data with R packages for data analysis.

You will produce a portfolio of data analysis projects from the Specialization that demonstrates mastery of statistical data analysis from exploratory analysis to inference to modeling, suitable for applying for statistical analysis or data scientist positions.

The first course is Introduction to Probability and Data

About the Course

This course introduces you to sampling and exploring data, as well as basic probability theory and Bayes' rule. You will examine various types of sampling methods, and discuss how such methods can impact the scope of inference. A variety of exploratory data analysis techniques will be covered, including numeric summary statistics and basic data visualization. You will be guided through installing and using R and RStudio (free statistical software), and will use this software for lab exercises and a final project. The concepts and techniques in this course will serve as building blocks for the inference and modeling courses in the Specialization.

Nikola etree.so problems with libxml2.2

Installing Nikola works great, but you get a problem with libxml2.2

Referenced from: /Users/stephen/anaconda/envs/nikola/lib/python3.6/site-packages/lxml/etree.cpython-36m-darwin.so
    Reason: Incompatible library version: etree.cpython-36m-darwin.so requires version 12.0.0 or later, but libxml2.2.dylib provides version 10.0.0

Typing:

conda install libxml2

seems to fix the issue,why?

don’t know, don’t care … just thanking the gods of stack overflow

Using Jupyter Notebooks in Nikola

When I tried to first import a Jupyter (ipython) file into Nikola it failed, I was quite disappointed because it was supposed to work out of the box.

But after a little research I found out if you use the nikola new_post with the --format=ipynb option it works perfectly

Original ipython file:

~/sites/website/trapezoid.ipynb
66590  9 Jun 04:59 trapezoid.ipynb

I ran the command: nikola new_post --title="IPython Notebook Demo" --format=ipynb --import=trapezoid.ipynb

Importing Existing Post
-----------------------
Title: IPython Notebook Demo
Scanning posts..........done!
[2016-06-09T18:20:46Z] INFO: new_post: Your post's text is at: posts/ipython-notebook-demo.ipynb

It produced the following slightly large file:

~/sites/website/posts/ipython-notebook-demo.ipynb
66815 10 Jun 04:20 ipython-notebook-demo.ipynb

This works beautifully. All it seems to do is append 10 lines to the metadata section of the jupyter notebook.

  "nikola": {
    "category": "",
    "date": "2016-06-10 04:20:46 UTC+10:00",
    "description": "",
    "link": "",
    "slug": "ipython-notebook-demo",
    "tags": "",
    "title": "IPython Notebook Demo",
    "type": "text"
  }

As you can see this is a standard Nikola header block for a new post, but this time at the bottom, not the top of the file. Below is the full metadata block including the 10 lines added by nikola.

 "metadata": {
  "kernelspec": {
    "display_name": "Python 3",
    "language": "python",
    "name": "python3"
  },
  "language_info": {
    "codemirror_mode": {
     "name": "ipython",
     "version": 3
    },
    "file_extension": ".py",
    "mimetype": "text/x-python",
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
    "version": "3.5.0"
  },
  "nikola": {
    "category": "",
    "date": "2016-06-10 04:20:46 UTC+10:00",
    "description": "",
    "link": "",
    "slug": "ipython-notebook-demo",
    "tags": "",
    "title": "IPython Notebook Demo",
    "type": "text"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}

IPython Notebook Demo

Basic Numerical Integration: the Trapezoid Rule

A simple illustration of the trapezoid rule for definite integration:

$$ \int_{a}^{b} f(x)\, dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right). $$


First, we define a simple function and sample it between 0 and 10 at 200 points

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
In [2]:
def f(x):
    return (x-3)*(x-5)*(x-7)+85

x = np.linspace(0, 10, 200)
y = f(x)

Choose a region to integrate over and take only a few points in that region

In [3]:
a, b = 1, 8 # the left and right boundaries
N = 5 # the number of points
xint = np.linspace(a, b, N)
yint = f(xint)

Plot both the function and the area below it in the trapezoid approximation

In [4]:
plt.plot(x, y, lw=2)
plt.axis([0, 9, 0, 140])
plt.fill_between(xint, 0, yint, facecolor='gray', alpha=0.4)
plt.text(0.5 * (a + b), 30,r"$\int_a^b f(x)dx$", horizontalalignment='center', fontsize=20);

Compute the integral both at high accuracy and with the trapezoid approximation

In [5]:
from __future__ import print_function
from scipy.integrate import quad
integral, error = quad(f, a, b)
integral_trapezoid = sum( (xint[1:] - xint[:-1]) * (yint[1:] + yint[:-1]) ) / 2
print("The integral is:", integral, "+/-", error)
print("The trapezoid approximation with", len(xint), "points is:", integral_trapezoid)
The integral is: 565.2499999999999 +/- 6.275535646693696e-12
The trapezoid approximation with 5 points is: 559.890625

Migrating from Wordpress to Nikola

  1. Export XML dump of Wordpress files
  2. Use nikola import_wordpress to import each blog entry in the wordpress XML file into a markdown file with a metadata header about the post.

Run the command nikola import_wordpress

[stephen@macbook.local] Thu Jun 09 ~/Downloads/stephenhucker.com
[17] 17:18:23--> nikola import_wordpress --one-file posts.xml -o out
[2016-06-09T07:19:49Z] WARNING: import_wordpress: You specified additional arguments (['out']). Please consider putting these arguments before the filename if you are running into problems.
ERROR: Error parsing Command: option -o not recognized (parsing options: ['-q', '-o'])
[2016-06-09T07:19:51Z] INFO: Nikola: Configuration will be written to: -o/conf.py

I obviously got some of the command syntax wrong, but I don't care because it did what I wanted.

The --one-file option is important. If you don't use it, then you will get two files for each blog post. One with markdown of the posts content and the other contains the metadata about the post.

anyong-hello-kitty.md

anyong-hello-kitty.meta

Metadata header made from wordpress blog entry

<!-- 
.. title: anyong hello Kitty
.. slug: anyong-hello-kitty
.. date: 2006-01-29 04:16:20
.. tags: 
.. category: 
.. link: 
.. description: 
.. type: text
.. wp-status: publish
-->

BODY OF WORDPRESS POST - Very often the body of the post will contain HTML elements b/c wordpress formatting can be a bit of a mess and this is the easiest way to extract the body of the post with its formatting.

<html><body><p>The ultimate example of Korean and Japanese cultural cooperation.

<img alt="Korean Hello Kitty" title="Korean Hello Kitty" src="http://static.flickr.com/30/89653293_99d7f40a8c.jpg"></p></body></html>

Nikola

Under Construction

Setting up Nikola.

In Pelican and Nikola regular markdown syntax to insert an image is:

![Under Construction](/images/under_construction.jpg)

I always keep my images in a folder called images I want this folder to be copied verbatim to the output folder (which I call output)

The source of the images folder is:

-->Files

---->images

------>under_construction.jpg

which means under_construction.jpg will be copied to

-->output

---->images

------>under_construction.jpg

From how-to-insert-pictures-into-posts-in-nikola

If you want Nikola to recognise markdown and ipynb posts then conf.py must contain a reference to .md and .ipynb

e.g. for Markdown ==> ("posts/*.md", "posts", "post.tmpl"), has to be in the POSTS or PAGES

POSTS = (
    ("posts/*.rst", "posts", "post.tmpl"),
    ("posts/*.md", "posts", "post.tmpl"),
    ("posts/*.ipynb", "posts", "post.tmpl"),
    ("posts/*.txt", "posts", "post.tmpl"),
    ("posts/*.html", "posts", "post.tmpl"),
)
PAGES = (
    ("stories/*.rst", "stories", "story.tmpl"),
    ("posts/*.md", "posts", "post.tmpl"),
    ("posts/*.ipynb", "posts", "post.tmpl"),
    ("stories/*.txt", "stories", "story.tmpl"),
    ("stories/*.html", "stories", "story.tmpl"),
)

Ref: Nikola Handbook

Misc Nikola commands

nikola plugin --list-installed

Lists all installed plugins

nikola  new_post -f markdown

creates a new post in markdown rather then the default RST

[stephen@macbook.local] Thu Jun 09 ~/sites/website/output
[15] 14:09:30--> nikola check -f --clean-files
Scanning posts..........done!
[2016-06-09T04:09:43Z] WARNING: check: Files from unknown origins (orphans):
[2016-06-09T04:09:43Z] WARNING: check: output/test.txt

Checks if the output folder can be generated by the posts (and Pages?) folder, the -f --clean-files removes any file such as test.txt that can't be gerneated by the input folders (Pages, Posts) In this case I manually created test.txt

economist

How to use the free software package Calibre to create an ePub of the economist each week.

Will Be done by Tuesday 7th of June at 9 am at the latest

Studying

Just finnished week 4,5,6 of the Coursera Programming for Everybody (Python) course. Its a pretty slow paced course which I don’t mind b/c it gives me a chance to play around with the ideas presented.

Google Photos

Google Photos was just released two days ago.
It offers unlimited storage for files under 16gb.
Great! My m 4/3 camera’s sensor is only 16mb, so I should be able to store all my photos for free :-)

But, it mentions compression.

So I upload a picture:

Original File size: 6.78 mb, dimensions: 4,592 x 2,584

I then downloaded it and noticed the file size was much less:

Google File size: 1.94 mb, dimensions: 4,592 x 2,584

After comparing the images I couldn’t discern any quality loss.