Building TensorFlow Systems from Components

Tuesday May 9, 2017

These are materials for a workshop given Tuesday May 9, 2017 at OSCON. (slides)

Workshop participant materials

These are things you can use during the workshop. Use them when you need them.

Focus one: getting data in
- Download notebook: start state / end state
- View notebook on GitHub: start state / end state
- mystery.tfrecords
Focus two: distributed programs
Focus three: high-level ML APIs
- Download notebook: start state / end state
- View notebook on GitHub: start state / end state
- Simple Regression with a TensorFlow Estimator
- Canned Models with Keras in TensorFlow

The whole slide deck and everything follows, and it's long.

Thank you!

Before starting, I want to take a moment to notice how great it is to have a conference. Thank you for being here! We are all really lucky. If this isn't nice, I don't know what is.

planspace.org

@planarrowspace

Hi! I'm Aaron. This is my blog and my twitter handle. You can get from one to the other. This presentation and a corresponding write-up (you're reading it) are on my blog (which you're on).

If you want to participate in the workshop, really go to planspace.org and pull up these materials!

Deep Learning Analytics

I work for a company called Deep Learning Analytics.

I'll talk more about DLA tomorrow as part of TensorFlow Day. Hope to see you there as well!

$ pip install --upgrade tensorflow
$ pip freeze | grep tensorflow
## tensorflow==1.1.0

You'll want to have TensorFlow version 1.1 installed. TensorFlow 1.1 was only officially released on April 20, and the API really has changed.

The core documented API, mostly the low-level API, is frozen as of TensorFlow 1.0, but a lot of higher-level stuff is still changing. TensorFlow 1.1 brings in some really neat stuff, like Keras, which we'll use later.

Hello, TensorFlow! on O'Reilly

As an example: I wrote Hello, TensorFlow! about a year ago. It used TensorFlow 0.8.

After TensorFlow 1.0 came out, I went and looked at the summary code block at the end of "Hello, TensorFlow!". It had 15 lines of TensorFlow code. Of that, 5 lines no longer worked. I had to update 33% of the lines of my simple TensorFlow example code because of API changes.

one third

I was surprised that it was one third.

I have an updated code snippet, but we're not doing "Hello, TensorFlow!" today.

$ pip install --upgrade tensorflow
$ pip freeze | grep tensorflow
## tensorflow==1.1.0

So please have TensorFlow 1.1 installed, is the point of that whole story.

The stable bits of TensorFlow really are stable, but there's a lot of exciting new stuff, and I don't want you to miss out today!

THE BIG IDEA

Okay! What is this workshop about?

use what you need

TensorFlow works for you! Use it where it does things that help you.

TensorFlow is a tool. It's a very general tool, on the one hand. It's also a tool with lots of pieces. You can use some of the pieces. Or you can decide not to use TensorFlow altogether!

We'll look at several specific aspects of TensorFlow. Maybe you'll want to use them. Maybe you won't. The hope is that you'll be more comfortable with what's available and able to decide what to apply when.

If you like Rich Hickey words, maybe I'm trying to decomplect the strands within TensorFlow so they can be understood individually.

THE PLAN

Here's the plan for this workshop.

one short work
three longer works

I'll talk about some things, but the hope of the workshop is that you do some good work.

I hope that you don't finish all the things you think of, but want to keep working after the workshop is over.

Also, there's a break from 10:30 to 11:00 on the official schedule, but I don't really like that. If we happen to be working during the break, fine. Feel free to take a break whenever you need a break.

Do what you want!

TensorFlow is really big, and not every part will be interesting or important to you.

I'll have very specific things for you to work on for each of the works, but if you think of something better to do, you better do it!

Intro by Logo

To get creative juices flowing a little, let's explore some logo history.

LOGO turle

Not that sort of Logo!

Fine to think about, though.

Gödel, Escher, Bach

In the beginning, there was Gödel, Escher, Bach: An Eternal Golden Braid by Douglas Hofstadter.

This "metaphorical fugue on minds and machines in the spirit of Lewis Carroll" includes the strange loop idea and related discussion of consciousness and formal systems. Hofstadter's influence can be seen, for example, in the name of the Strange Loop tech conference.

It seems likely that many people working in artificial intelligence and machine learning have encountered Gödel, Escher, Bach.

Chernin Entertainment

Of course, there's also Chernin Entertainment, the production company.

So we shouldn't rule out the possibility that Google engineers are fans of Chernin's work. Hidden Figures is quite good. And I guess a lot of people like New Girl?

TensorFlow logo - old?

In any event, somehow we get to this TensorFlow logo:

If you look carefully, does it seem like the right side of the "T" view is too short?

Issue 1922

This very serious concern appears as issue #1922 on the TensorFlow github, and on the TensorFlow mailing list complete with ASCII art illustration.

The consensus response seemed to be some variant of "won't fix" (it wouldn't look as cool, anyway) until...

TensorFlow logo - new?

As of around the 1.0 release of TensorFlow, which was around the first TensorFlow Dev Summit, this logo variant seems to be in vogue. It removes the (possibly contentious) shadows, and adds additional imagery in the background.

I want to suggest that the image in the background can be a kind of Rorschach test for at least three ways you might be thinking about TensorFlow.

Is it a diagram of connected neurons? Maybe you're interested in TensorFlow because you want to make neural networks.
Is it a diagram of a computation graph? Maybe you're interested in TensorFlow for general calculation, possibly using GPUs.
Is it a diagram of multiple computers in a distributed system? Maybe you're interested in TensorFlow for its distributed capabilities.

There can be overlap among those interpretations as well, but I hope the point is not lost that TensorFlow can be different things to different people.

short work

I want to get you thinking about systems without further restriction. The idea is to imagine and start fleshing out a system that might involve TensorFlow.

Later on we'll follow the usual workshop pattern of showing you something you can do and having you do it. But it's much more realistic and interesting to start from something you want to do and then try to figure out how to do it. So let's start from the big picture.

draw a system!

block diagram
add detail
pseudocode?

You can draw a system you've already made, or something you're making, or something you'd like to make. It could be something you've heard about, or something totally unique.

You don't have to know how to build everything in the system. You don't need to know how TensorFlow fits in. Feel free to draw what you wish TensorFlow might let you do.

Keep adding more detail. If you get everything laid out, start to think about what functions or objects you might need to have, and start putting pseudocode together.

short work over

Moving right along...

Let's do hard AI!

I'm going to walk through an example and talk about how TensorFlow can fit in.

Doing hard AI, or Artificial General Intelligence (AGI), is intended as a joke, but it's increasingly less so, with Google DeepMind and OpenAI both explicitly working on getting to AGI.

(build)

I've got 35 slides of sketches, so they're in a separate PDF to go through.

system

Here's the final system that we built to.

We're not going to do everything in there today.

getting data in
distributed programs
high-level ML APIs

These are the three focuses for today.

The original description for this workshop also mentioned serving, but I'm not covering it today. Sorry. Not enough time.

What even is TensorFlow?

Here's one way to think about TensorFlow.

TensorFlow three-tier diagram

The beating heart of TensorFlow is the Distributed Execution Engine, or runtime.

One way to think of it is as a virtual machine whose language is the TensorFlow graph.

That core is written in C++, but lots of TensorFlow functionality lives in the frontends. In particular, there's a lot in the Python frontend.

(Image from the TensorFlow Dev Summit 2017 keynote.)

Python programming language

Python is the richest frontend for TensorFlow. It's what we'll use today.

You should remember that not everything in the Python TensorFlow API touches the graph. Some parts are just Python.

R programming language

R has an unofficial TensorFlow API which is kind of interesting in that it just wraps the Python API. It's really more like an R API to the Python API to TensorFlow. So when you use it, you write R, but Python runs. This is not how TensorFlow encourages languages to implement TensorFlow APIs, but it's out there.

C programming language

The way TensorFlow encourages API development is via TensorFlow's C bindings.

C++ programming language

You could use C++, of course.

Java programming language

Also there's Java.

Go programming language

And there's Go.

Rust programming language

And there's Rust.

Haskell programming language

And there's even Haskell!

TensorFlow three-tier diagram

Currently, languages other than Python have TensorFlow support that is very close to the runtime. Basically make your ops and run them.

This is likely to be enough support to deploy a TensorFlow system in whatever language you like, but if you're developing and training systems you probably still want to use Python.

graph or not graph?

So this is a distinction to think about: Am I using the TensorFlow graph, or not?

getting data in

So here we are at the first focus area.

We'll do two sub-parts.

getting data in
- to the graph
- TFRecords

First, a quick review of the the graph and putting data into it.

Second, a bit about TensorFlow's TFRecords format.

(notebook)

There's a Jupyter notebook to talk through at this point.

Download: start state / end state
View on GitHub: start state / end state

long work

It's time to do some work!

work with data!

your own system?
planspace.org: mystery.tfrecords
move to graph

Option one is always to work on your own system. Almost certainly there's some data input that needs to happen. How are you going to read that data, and possibly get it into TensorFlow?

If you want to stay really close to the TensorFlow stuff just demonstrated, here's a fun little puzzle for you: What's in mystery.tfrecords?

That could be hard, or it could be easy. If you want to extend it, migrate the reading and parsing of the TFRecords file into the TensorFlow graph. The demonstration in the notebook worked with TFRecords/Examples without using the TensorFlow graph. The links on the mystery.tfrecords page can help with this.

long work over

Moving right along...

data

As a wrap-up: Working directly with data, trying to get it into the right shape, cleaning it, etc., may not be the most fun, but it's got to be done. Here's a horrible pie chart from some guy on Forbes, the point of which is that people spend a good deal of time fighting with data.

distributed programs

We arrive at the second focus area. TensorFlow has some pretty wicked distributed computing capabilities.

distributed programs
- command-line arguments
- MapReduce example

I'm putting a bit about command-line arguments in here because I think it's interesting. It doesn't necessarily fit in with distributed computing, although you might well have a distributed program that takes command-line arguments. Google Cloud ML can use command-line arguments for hyperparameters, for example.

Then we'll get to a real distributed example, in which we implement a distributed MapReduce word count in 50 lines of Python TensorFlow code.

command-line arguments

So let's take a look at some ways to do command-line arguments.

This section comes from the post Command-Line Apps and TensorFlow.

$ python script.py --color red
a red flower

I'll show eight variants that all do the same thing. You provide a --color argument, and it outputs (in text) a flower of that color.

import sys

def main():
    assert sys.argv[1] == '--color'
    print('a {} flower'.format(sys.argv[2]))

if __name__ == '__main__':
    main()

This is a bare-bones sys.argv method. Arguments become elements of the sys.argv list, and can be accessed as such. This has limitations when arguments get more complicated.

import sys
import tensorflow as tf

flags = tf.app.flags
flags.DEFINE_string(flag_name='color',
                    default_value='green',
                    docstring='the color to make a flower')

def main():
    flags.FLAGS._parse_flags(args=sys.argv[1:])
    print('a {} flower'.format(flags.FLAGS.color))

if __name__ == '__main__':
    main()

This FLAGS API is familiar to Googlers, I think. It's interesting to me where some Google-internal things peak out from the corners of TensorFlow.

import sys
import gflags

gflags.DEFINE_string(name='color',
                     default='green',
                     help='the color to make a flower')

def main():
    gflags.FLAGS(sys.argv)
    print('a {} flower'.format(gflags.FLAGS.color))

if __name__ == '__main__':
    main()

You could also install the gflags module, which works much the same way.

import sys
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--color',
                    default='green',
                    help='the color to make a flower')

def main():
    args = parser.parse_args(sys.argv[1:])
    print('a {} flower'.format(args.color))

if __name__ == '__main__':
    main()

Here's Python's own standard argparse, set up to mimic the gflags example, still using sys.argv explicitly.

import tensorflow as tf

def main(args):
    assert args[1] == '--color'
    print('a {} flower'.format(args[2]))

if __name__ == '__main__':
    tf.app.run()

Using tf.app.run(), another Google-ism, frees us from accessing sys.argv directly.

import tensorflow as tf

flags = tf.app.flags
flags.DEFINE_string(flag_name='color',
                    default_value='green',
                    docstring='the color to make a flower')

def main(args):
    print('a {} flower'.format(flags.FLAGS.color))

if __name__ == '__main__':
    tf.app.run()

We can combine tf.app.run() with tf.app.flags.

import google.apputils.app
import gflags

gflags.DEFINE_string(name='color',
                     default='green',
                     help='the color to make a flower')

def main(args):
    print('a {} flower'.format(gflags.FLAGS.color))

if __name__ == '__main__':
    google.apputils.app.run()

To see the equivalent outside the TensorFlow package, we can combine gflags and google.apputils.app.

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--color',
                    default='green',
                    help='the color to make a flower')

def main():
    args = parser.parse_args()
    print('a {} flower'.format(args.color))

if __name__ == '__main__':
    main()

This has all been fun, but here's what you should really do. Just use argparse.

Whew!

That was a lot of arguing.

It may be worth showing all these because you'll encounter various combinations as you read code out there in the world. More recent examples are tending to move to argparse, but there are some of the other variants out there as well.

MapReduce example

Now to the MapReduce example!

TensorFlow three-tier diagram

Recall that the core of TensorFlow is a distributed runtime. What does that mean?

tf.device()

Behold, the power of tf.device()!

with tf.device('/cpu:0'):
    # Do something.

You can specify that work happen on a local CPU.

with tf.device('/gpu:0'):
    # Do something.

You can specify that work happen on a local GPU.

with tf.device('/job:ps/task:0'):
    # Do something.

In exactly the same way, you can specify that work happen on a different computer!

This is pretty amazing. I think of it as sort of declarative MPI.

distributing a TensorFlow graph

TensorFlow automatically figures out when it needs to send information between devices, whether they're on the same machine or on different machines. So cool!

oh map reduce...

MapReduce is often associated with Hadoop. It's just divide and conquer.

So let's do it with TensorFlow!

(demo)

It's time to see how this looks in practice! (Well, or at least to see how it looks in a cute little demo.)

The demo uses the contents of the mapreduce_with_tensorflow repo on GitHub. For more explanation, see TensorFlow Clusters: Questions and Code and Distributed MapReduce with TensorFlow.

(code)

Walking through some details inside count.py.

long work

It's time to do some work!

make something happen!

your own system?
different distributed functionality?
add command-line?

Option one is always to work on your own system. Maybe command-line args are relevant to you. Maybe running a system across multiple machines is relevant for you. Or maybe not.

If you want to exercise the things just demonstrated, you could add a command-line argument to the distributed word-count program. For example, you could make it count only a particular word, or optionally count characters, or something else. Or you could change the distributed functionality without any command-line fiddling. (You could make it a distributed neural net training program, for example.)

Here are some links that might be helpful:

long work over

Moving right along...

oh kubernetes...

I should probably say that you don't really want to start all your distributed TensorFlow programs by hand. Containers and Kubernetes and all that.

high-level ML APIs

Let's to some machine learning!

TensorFlow six-tier diagram

New and exciting things are being added in TensorFlow Python land, building up the ladder of abstraction.

This material comes from Various TensorFlow APIs for Python.

(Image from the TensorFlow Dev Summit 2017 keynote.)

slim... shady?

The "layer" abstractions largely from TF-Slim are now appearing at tf.layers.

scikit-learn logo

The Estimators API now at tf.estimator is drawn from tf.contrib.learn work, which is itself heavily inspired by scikit-learn.

Keras

And Keras is entering TensorFlow first as tf.contrib.keras and soon just tf.keras with version 1.2.

high-level ML APIs
- training an Estimator
- pre-trained Keras

So let's try this out!

(notebook)

There's a Jupyter notebook to talk through at this point.

Download: start state / end state
View on GitHub: start state / end state

There's also a TensorBoard demo baked in there. A couple backup slides follow.

graph

This is what the graph should look like in TensorBoard.

steps per second

Steps per second should look something like this.

loss

Loss should look like this.

Doctor Strangelog

I just enjoy this image too much not to share it.

long work

It's time to do some work!

make something happen!

your own system?
flip regression to classification?
classify some images?

Option one is always to work on your own system. Do you need a model of some kind in your system? Maybe you can use TensorFlow's high-level machine learning APIs.

If you want to work with the stuff just shown some more, that's also totally cool! Instead of simple regression, maybe you want to flip the presidential GDP problem around to be logistic regression. TensorFlow has tf.contrib.learn.LinearClassifier for that. And many more variants!

Or maybe you want to classify your own images, or start to poke around the model some more. Also good! If you want more example images, there's this set.

Here are some links that could be helpful:

long work over

Moving right along...

shock

Oh my! Are we out of time already?

What else?

There are a lot of things we haven't covered.

debugging, optimizing (XLA, low-precision, etc.), serving, building custom network architectures, embeddings, recurrent, generative, bazel, protobuf, gRPC, queues, threading...

Here's a list of the first things that came to mind.

I hope you continue to explore!

Thanks!

Thank you!

planspace.org

@planarrowspace

This is just me again.