Amit's (old) blog >> go to akgupta.ca

Math, Code, etc.

Is Programming Math? You're Asking the Wrong Question. Part 2

In Part 1 I started to address Sarah Mei's recent blog post about how programming is not math. In this post, I'll continue my critique of her article and highlight the ways in which math can be valuable to a programmer.

Understanding math to understand programming problems

In her article, Sarah attacks the claim that "without a mathematical foundation, you’ll have only a surface understanding of programming" by first reducing it to an alleged "common variation" of the same sentiment: "without a CS degree, you can’t build anything substantial." But wait, those aren't the same thing at all.

Is Programming Math? You're Asking the Wrong Question. Part 1

A couple days ago, my former colleague Sarah Mei trolled nerds everywhere, claiming programming is not math. In her post, she argues that the value proffered to a programmer by learning/knowing/doing math is greatly diminished compared to what widely held beliefs about math and programming would suggest. In particular, she claims that:

  1. Most developers don't require any applications of math in nearly all of the problems they need to solve.
  2. Understanding math-flavoured concepts from computer science, such as big O analysis, has little correlation with a solid understanding of the problems most developers need to solve.
  3. While programmers do require abstract thinking and problem solving skills, there is nothing special about math as a way to develop these skills.

So the question that's truly of interest here is not whether programming is math, but

In what ways, and to what extent, does learning and knowing math and computer science help programmers solve problems?

k-NN Algorithm in Golang and Haskell

There's been a recent string of blog posts featuring implementations of the k-Nearest Neighbour algorithm in several languages (for k = 1), and it's been cool to see how the solutions are expressed differently in different languages, as well as the vast differences in performance. The k-Nearest Neighbour algorithm is cool in itself, it's a dead simple technique for classifying (or labelling) a collection of observed data by searching the previously observed collections of data whose classifications/labels are already known, and finding the one which most nearly resembles our not-yet-classified collection (for some definition of "nearly resembles"). It's an example of a machine learning classification algorithm, one of those things that lives in the fun intersection of math and programming.

We're gonna take a look at a concrete application of the k-NN algorithm, compare the performance of the implementations from those aforementioned blog posts with new implementations in Golang and Haskell, and take a look at an optimized version which takes a logical shortcut and also leverages Golang's built-in support for concurrency.

All the code and datasets can be found on Github. The Golang and Haskell code is also at the bottom of this post.

TL;DR: Golang wins, or, in honor of the World Cup: GOOOOOOOOOOLLLLLLLang!!!

So You Still Don't Understand Hindley-Milner? Part 3

In Part 2, we finished defining all the formal terms and symbols you see in the StackOverflow question on the Hindley-Milner algorithm, so now we're ready to translate what that question was asking about, namely the rules for deducing statements about type inference. Let's get down to it!

The rules for deducing statements about type inference

[Var]


$\underline{x:\sigma \in \Gamma}$

Γ  ⊢ x: σ

This translates to: If "x has type σ" is a statement in our collection of statements Γ , then from Γ  you can infer that x has type σ. Here x is a variable (hence the name of this rule of inference). Yes, it should sound that painfully obvious. The terse, cryptic way that [Var] is expressed isn't that way because it contains some deep, difficult fact. It's terse and succinct so that a machine can understand it and type inference can be automated.

So You Still Don't Understand Hindley-Milner? Part 2

In Part 1, we said what the building blocks of the Hindley-Milner formalization would be, and in this post we'll thoroughly define them, and actually formulate the formalization:

Formalizing the concept of an expression

We'll give a recursive definition of what an expression is; in other words, we'll state what the most basic kind of expression is, we'll say how to create new, more complex expressions out of existing expressions, and we'll say that only things made in this way are valid expressions.

Entropy: How Password Strength Is Measured

My colleague Mike Sierchio wrote a cool post on password strength, and the concept of entropy. As he points out, entropy isn't always entropy. That confusion is apparently not uncommon, as it's been asked about on IT Security Stack Exchange as well. So what's really going on?

Let's step back for a sec and fill in some context. What are we trying to do? We'd like some way to measure how hard it is to guess our passwords, a number that serves as a heuristic standard of password strength. But there are two fundamentally different things we might want to measure:

  1. How hard would it be for someone to guess your password with essentially no knowledge of how you created your password?

  2. How hard would it be for someone to guess your password if they knew the process used to generate it? This is of course assuming that there is a process, for example some script that does some Math.rand-ing and produces a password string.

The term "entropy" has been used to refer to both kinds of calculations, but they're clearly entirely different things: the former essentially takes a string as input, the latter takes a random process as input. Hence, "entropy is not entropy."

Alright, well if entropy isn't entropy, let's see what entropies are. We'll look at the standard mathematical formulation of the random-process-entropy which comes from information theory. And we'll look at the function used to calculate particular-string-entropy in some password strength tester (e.g. http://rumkin.com/tools/password/passchk.php). And that's all we're going to do, we'll look at how the calculations are done, without dwelling too much on the differences between the two approaches or what their use cases are.

Yo Dawg, I Herd You Like Math

I've been learning a bit of statistical computing with R lately on the side from Chris Paciorek's Berkeley course. I just got introduced to knitr and it's damned sweet! It's an R package which takes a LaTeX file with embedded R, and produces a pure LaTeX file (similar to how Rails renders an .html.erb file into a .html file), where the resulting LaTeX file has the output of the R code. It makes it super easy to embed statistical calculations, graphs, and all the good stuff R gives you right into your TeX files. It let's you put math in your math, so you can math while you math.

I've got a little project which:

  1. Runs a Python script which will use Selenium to scrape a web page for 2012 NFL passing statistics.
  2. "Knits" a TeX file with embedded R that cleans the raw scraped data, produces a histogram of touchown passes for teams, and displays the teams with the least and greatest number of touchdowns.
  3. Compiles the resulting TeX file and opens the resulting PDF.
  4. Cleans up any temporary work files.

Is It Possible to Be 15% Swedish?

This question came up as a joke during a team standup a few months ago. Although the obvious answer is "no," if you're willing to play fast and loose with your metaphysics for a bit, the answer can be "yes" and there's a cute solution that ties together binary numbers and binary trees. This post itself is a bit of a joke in that it's just for fun, but it might be nice to see the familiar concepts of binary numbers and binary trees in a new light.

The obvious answer is "no"

Let's quickly see why the real life answer is "no." But first we should lay out the assumptions implicit in the problem. We're going to assume that at some point in time, everyone was either entirely Swedish or entirely non-Swedish. There's a chicken-and-egg problem that we're sweeping under the rug here, but that's what rugs are for. Next we're assuming that every person after that point in time has their Swedishness wholly and equally determined by their parents Swedishness. So if mom is 17% Swedish and dad is 66% Swedish, then baby is ½ x 17% + ½ x 66% = 41.5% Swedish.

So You Still Don't Understand Hindley-Milner? Part 1

I was out for drinks with Josh Long and some other friends from work, when he found out I "speak math." He had come across this StackOverflow question and asked me what it meant:



Before we figure out what it means, let's get an idea for why we care in the first place. Daniel Spiewak's blog post (link broken) gives a really nice explanation of the purpose of the HM algorithm, in addition to an in-depth example of its application:

Functionally speaking, Hindley-Milner (or “Damas-Milner”) is an algorithm for inferring value types based on use.  It literally formalizes the intuition that a type can be deduced by the functionality it supports.

Okay, so we want to formalize an algorithm for inferring types of any given expression. In this post, I'm going to touch on what it means to formalize something, then describe the building blocks of the HM formalization. In Part 2, I'll flesh out the building blocks of the formalization. Finally in Part 3, I'll translate that StackOverflow question.

Setting Up PostgreSQL With MacPorts for Ruby on Rails Development

This post is pretty much just what the title says. If you develop in Rails, on a Mac, using PostgreSQL as your database and MacPorts as your package management system of choice, it can be hard to get everything set up and going. This is especially true if you're new to absolutely everything, which was my situation when I first tried to do this. Part of what was hard was that most of the advice on the Internet was just a bunch of commands to run with little explanation, so it was hard to know what you had to do to customize it for your environment. So this explanation is hopefully a little more thorough. However, if you're impatient and just want the TL;DR, here it is (with several implicit assumptions):

1
2
3
4
5
6
7
8
9
10
11
12
13
sudo port install postgresql92-server
sudo mkdir -p /opt/local/var/db/postgresql92/defaultdb
sudo chown postgres:postgres /opt/local/var/db/postgresql92/defaultdb
sudo su postgres -c '/opt/local/lib/postgresql92/bin/initdb -D /opt/local/var/db/postgresql92/defaultdb'
sudo su postgres
/opt/local/lib/postgresql92/bin/pg_ctl -D /opt/local/var/db/postgresql92/defaultdb/ -l /opt/local/var/db/postgresql92/defaultdb/server.log start
exit
echo 'PATH=/opt/local/lib/postgresql92/bin/:$PATH' >> ~/.bashrc
source ~/.bashrc
rails new my_app --database=postgresql
sed -e 's/username: my_app/username: postgres/g' -i '' my_app/config/database.yml
cd my_app
rake db:create:all