Lead/Lag

In control engineering, the concepts of stability and instability precede the design of the control algorithms themselves. As an engineering student, studying and understanding the behavior of dynamic systems is a prerequisite to controlling them. As elementary as it is as an engineering concept, it often seems to be beyond our collective comprehension as a society, even if it is well within our individual grasp. While most of us may scratch our heads when looking at a Nyquist Plot, we do understand that you’ve got to start backing off the accelerator even before you get up to your desired speed.

When I was a younger man, I had a roommate who drove a Datsun 280Z. This was a nifty little car. Price-wise, it was accessible to the young, the working class, without breaking the bank like. Not only was it affordable to buy but it was affordable to own. It was a pleasure to drive and, top it all off all, it was fairly impressive performance-wise. A nice example of the “roadster” style.

We took a few summer road trips together and, on more than one occasion, he asked me to drive so he could take a nap, sober up, or just monkey with his cassette deck. Before letting me drive for the first time, he gave me a bit of a run down on his car’s performance. “Don’t go over 90 mph,” he warned me, as the car would shake violently shortly after crossing the 90 mark on the speedometer. Needless to say, the car never shook itself apart while I was driving. Perhaps that was because, as a conscientious and responsible youth, I would never exceed the speed limit. Perhaps it was because he wasn’t entirely correct about the circumstances under which his car would have vibration issues.

My point is, most of us have a gut understanding of frequency response and stability and the struggles of controlling it. The seriousness of the problem is exposed in the design of mechanical systems and, in particular, those that incorporate high-frequency rotation of components. Deeper understanding and mathematical analyses are necessary prerequisites to assembling a piece of machinery that will hurtle through the night at speeds approaching 100 mph. In the case of my friend’s Datsun, as cyclic energy is induced into a system, it is possible for those inputs to resonate in the spring-like manifestations of the system’s passive structure. Without proper analysis and design, a vehicle’s suspension system might well start to exhibit extreme vibration at high speeds. The same applies to any dynamic system. We are all familiar with the violently-shaking washing machine, whether we have one in our own home or not.

Naturally, the mathematics apply to non-mechanical systems as well. Often the effects are far more serious than a shaking car or a jumping washing machine. In electric circuits, resonance can produce seemingly impossibly-high voltages and currents. Water hammer in a hydraulic system can crush equipment and cause explosions. The analyses that help us understand these physical phenomena, I’ll argue today, would also help us understand interactions in social systems and the effect of a “black swan event,” if we allow them to.

It’s the Stupid Economy

The sometimes-sizeable gap between “gut feel” and mathematical certainty is particularly common to complex systems. Coincidentally, our body politic is eager to tackle the most complex of systems, attempting to control them through taxation and regulation. The global climate and national economies seem to be a recent, and often interconnected, favorite. I shall leave the arguments of climate science and engineering to others and, today, focus on the economy. When it comes to the politics of the economy, I have noticed a pattern. At the intersection of economics and politics, the thinking is shockingly short-term. Shocking, because the economic environment may be the number one predictor for the outcome of an election. A good economy strongly favors the incumbents whereas economic misery almost guarantees a changing of the guard. You would think that if the economic conditions are what matter most to us, when it comes to our one contribution to the governance of society, we’d be eager to get it right. Yet, what seems to matter most are the economic conditions on the day of the polling. Four years of economic growth doesn’t mean much if the economy tanks on the 30th of October.

In something of a mixed blessing, the recent political free-for-all has challenged this shortsightedness, at least somewhat. President Obama, for years, blamed his predecessor for recession and deficit spending, despite a negative economic climate persisting for years into his term. He even famously took credit for the positive economic indicators during his successor’s term. His opponents, of course, sought to do the opposite. The truth is far more nuanced than any care to admit, but at least popular culture is broaching the subject. Most of us know that if “the economy” is looking up the day after the President signs off on a new initiative, it wasn’t his signature that did it. Or, more accurately, it can’t possibly account for the entirety of the impact, which may take months or years to reveal its full effect.

Going Viral

We have a further advantage when it comes to talking about the interaction between the economy, the novel coronavirus, and the resultant economic shutdown. The media has inundated us with bell curves and two-week lags. Most of us can appreciate the math that says if a Governor closes bars and restaurants today, we shouldn’t yet be looking for the results in our statistics tomorrow. Nonetheless, our collective grasp of dynamic systems and probabilities is tenuous under the best of circumstances. Mix in high levels of fear and incessant media hype, and even things that should be obvious become lost in the surrounding clamor.

Shift the playing field to economics and the conversation gets even murkier.

“The economy” is at the high-end of complex and chaotic systems. It is, after all, not an external system that we can observe and interact with, nor is it subject to the laws of physics. Rather, it is the collective behavior of all of us, each and every individual, and how we interact with each other to produce, consume, and exchange. Indeed, one might speculate on where the boundaries really lie. It seems a bit insensitive to label everything as “economic activity” during a health crisis, but what is it that we can exclude? Waking and sleeping, each of us are in the process of consuming food, water, clothing, and shelter. Most social interactions also involve some aspect of contract, production, or consumption. Even if we can isolate an activity that seems immune to all that, all human activity still occurs within that structure that “society,” and thus “the economy,” provides.

Within that framework, anyone who claims to “understand” the economy is almost certainly talking about a simplified model and/or a restricted subset of economic activity. Either that, or they are delusional. Real economic activity cannot be understood. Even if the human mind was vastly more capable, the interaction of every human being on the planet is, quite simply, unpredictable. Because of this, we use proxies for economic activity as a way to measure health and the effects of policy. GDP and GDP-growth are very common. Stock market performance substitutes for economic health in most of our minds and in the daily media. Business starts, unemployment numbers, average wages – each of these are used to gauge what is going on with the economy. However, every one of these metrics is incomplete at best and, more often than not, downright inaccurate in absolute terms.

Of course, it isn’t quite as bad as I make it out to be. GDP growth may contain plenty of spurious data, but if we seek to understand what is included and not included, and apply it consistently, we can obtain feedback that guides our policymaking. For example, we could assume that noisy prices associated with volatile commodities are not relevant to overall inflation numbers, or we can exclude certain categories when calculating GDP for the purpose of determining inflation. As long as we’re comparing apples to apples, our policy will be consistent.

What happens, though, when we get the economic equivalent of a hydraulic shock? In this case, our models of the economic world no longer apply and the world enters into an entirely unpredicted and unpredictable realm. We know this. What I want to explore, however, is what happens to our ability to “control” that system. My guess is it fails. It fails because we, again collectively, don’t appreciate the characteristics of dynamic systems. Yes, we understand it in terms of the heuristics we’ve traditionally used. Interest rates have to be raised before inflation kicks in to keep it from spiraling out of control. But what inflation will result from a $5 trillion stimulus at a time of 30% unemployment? Do we need higher or lower interest rates? In other words, when our traditional metrics fail us, will we truly appreciate the complex nature of the system?

In Control

During our imposed down-time, I re-watched an excellent film about the now-10-plus-year-old financial crisis induced by the housing market. The film The Big Short was made in 2015 based on Michael Lewis’ 2010 book of the same name. It dramatizes the subprime housing market collapse as seen by a handful of investors who saw it coming. As much as the story seems, today, in our distant past, there are those among us who feel that what we witnessed in 2008 was just the opening chapters in a longer tale. Whether a housing crisis is our past or our future, there are lessons to be applied to the present day.

The film’s story opens in 2005. Investor Michael Burry, reading the details of mortgage-backed security prospectuses, determines that the housing market is unstable and the financial instruments built upon it are doomed to fail. Unable to take a contrarian financial position using existing instruments, he commissions the creation of the Credit Default Swap to allow him to bet that the mortgage market will fail. The film concludes when Burry, and several others who bet against the housing market, liquidate their positions at a profit, sometime after Spring, 2008. The real-life Burry had actually been analyzing data from 2003 and 2004 before making his predictions and his commitment. Burry later wrote a piece for the New York Times saying that the housing market failure was predictable as much as four or five years out.

Putting this another way, by 2004 or 2005, the massive financial crisis of 2008-2010 had already happened, we just didn’t realize it yet. One might argue that sometime in those intervening four years, sanity might have come over America’s banks and the prospective home-owners to whom they were lending, but of course it didn’t. The reasons are many why it didn’t; why perhaps it couldn’t. Thus the events that all-but-inevitably put us on the road to global financial advance happened [four, five, more?] years in advance of what we would consider the start of the crisis. Unemployment numbers didn’t recover until 2014. That implies that for the individual, perhaps someone becoming unemployed and being unable to find a new position circa 2014, the impact of the collapse may have taken more than a decade to manifest itself.

Again, let’s look at it from a different angle. Suppose I wanted to avoid the tragedy to that individual who, in 2014, became unemployed. Let’s imagine that, as a result of his lack of employment, he died. Maybe it was suicide or opioid addiction. Maybe the job loss turned into a home loss and his whole family suffered. Suppose as a policy maker, I wanted to take macro-economic action to avoid that unnecessary death. How soon would I have had to act? 2003? 2000? Sometime in the 1990s?

Next Time

All of this comes to mind today as a result of the talk I am seeing among my fellow citizens. People are angry, although that isn’t entirely new. Some are angry because their livelihoods have been shut down while others are angry that folks would risk lives and health merely to return to those livelihoods. In the vast majority of cases, however, the fear is about near term effects. Will my restaurant go bankrupt given the next few weeks or months of cash-flow? What will the virus do two weeks after the end of lockdown? Will there be a “second wave” next fall? A recent on-line comment remarked that, although the recovery phase would see bumps along the road, “We’ll figure it out. We always do.”

Statistically, that sentiment is broadly reflected in the population at large. A summary of poll data through the end of March (http://www.apnorc.org/projects/Pages/Personal-impacts-of-the-coronavirus-outbreak-.aspx) suggested similar thinking. A majority of those currently out-of-work see no problems with returning “once it’s over.” In fact, a majority figure that by next year they’ll be as good as or better off financially than they are now. Statements like “we’ll get through this and come out stronger than ever” can be very motivational, but extending that to all aspects of economic and financial health seems a bit blind.

We’re losing track of the macro-economic implications for the personally experienced trees. We’ve all seen the arguments. Is it better to let grandpa die so that the corner burger shack can open back up a few weeks earlier? The counter argument cites the financial impact of a keeping the economy mostly-closed-down for a few more weeks. This isn’t the point, though, is it? On all sides of the argument it seems that the assumption is that we can just flip everything back on and get back to business. We are oblivious to the admittedly unanswerable question – how much damage has already been done?

Unprecedented

Words like “historic” and “unprecedented” are tossed around like confetti, but not without reason. In many ways our government and our society have done things – already done things, mind you – that have never happened before in the history of man. At first, the “destruction” seemed purely financial. Restaurants being shut down meant a loss in economic activity; a destruction of GDP. But is that even a real thing? Can’t we just use a stimulus bill to replace what is lost and call it even? But as April turns into May, we’re starting to see stories of real and literal destruction, not just lost opportunity. Milk is dumped because it can’t be processed. Vegetables are plowed under. Beef and chickens are killed without processing. This is actual destruction of real goods. Necessary goods. How can this go away with a reopening and some forgivable loans?

None of the experience gained through the financial crises of my lifetime would seem to apply. Even the Great Depression, while correct in magnitude, seems to miss the mark in terms of methodology. We’re simultaneously looking at a supply shock, a consumer depression, and inflationary fiscal policy. It’s all the different flavors or financial crisis, but all at the same time. Imagine a hydraulic shock in some rotating equipment where the control system itself has encountered a critical failure. I’ve decided that, for me, the best comparison is the Second World War. Global warfare pulled a significant fraction of young men out of the workforce, many never to return. Shortages ravaged the economy, both through the disruption of commerce as well as the effects of rationing. A sizable percentage of the American economic output was shipped overseas and blown up; gone.

Yet we got through it. We always do.

But we did so because we were willing to make sacrifices for the good of the nation and the good of the free world. We also lost a lot of lives and a lot of materiel. If “we” includes the citizens of Germany or the Ukraine, the devastation to society and culture was close to total, depending on where they called home. So, yes, civilization came through the Second World War and, as of a year or so ago, were arguably better than ever, but for far too many that “return to normalcy” took more than a generation. Will that be the price we have to pay to “flatten the curve?”

Artificial, Yes, but Intelligent?

Keep your eyes on the road, your hands upon the wheel.

When I was in college, only one of my roommates had a car. The first time it snowed, he expounded upon the virtues of finding an empty and slippery parking lot and purposely putting your car into spins. “The best thing about a snow storm,” he said. At the time I thought he was a little crazy. Later, when I had the chance to try it, I came to see it his way. Not only is it much fun to slip and slide (without the risk of actually hitting anything), but getting used to how the car feels when the back end slips away is the first step in learning how to fix it, should it happen when it actually matters.

Recently, I found myself in an empty, ice-covered parking lot and, remembering the primary virtue of a winter storm, I hit the gas and yanked on the wheel… but I didn’t slide. Instead, I encountered a bunch of beeping and flashing as the electronic stability control system on my newish vehicle kicked in. What a disappointment it was. It also got me thinkin’.

For a younger driver who will almost never encounter a loss-of-traction slip condition, how do they learn how to recover from a slide or a spin once it starts? Back in the dark ages, when I was learning to drive, most cars were rear-wheel-drive with a big, heavy engine in the front. It was impossible not to slide around a little when driving in a snow storm. It was almost a prerequisite to going out into the weather to know all the tricks of slippery driving conditions. Downshifting (or using those number gears on your automatic transmission), engine breaking, and counter-steering were all part of getting from A to B. As a result*, when an unexpectedly slippery road surprises me, I instinctively take my foot off the brakes/gas and counter-steer without having to consciously remember the actual lessons. So does a car that prevents sliding 95% of the time result in a net increase in safety, even though it probably makes that other 5% worse? It’s not immediately obvious that it does.

On the Road

I was reminded of the whole experience a month or so ago when I read about the second self-driving car fatality. Both crashes happened within a week or so of each other in Western states; the first in Arizona and the second in California. In the second crash, Tesla’s semi-autonomous driving function was in fact engaged at the time of the crash and the drivers hands were not on the wheel six seconds prior. Additional details do not seem to be available from media reports, so the actual how and why must remain the subject of speculation. In the first, however, the media has engaged in the speculation for us. In Arizona, it was an Uber vehicle (a Volvo in this case) that was involved and the fatality was not the driver. The media has also reported quite a lot that went wrong. The pedestrian who was struck and killed was jaywalking, which certainly is a major factor in her resulting death. Walking out in front of a car at night is never a safe thing to do, whether or not that car is self-driving. Secondly, video was released showing the driver was looking at something below the dashboard level immediately before the crash and thus was not aware of the danger until the accident occurred. The self-driving system itself did not seem to take any evasive action.

Predictably, the Arizona state government responded by halting the Uber self-driving car program. More on that further down, but first look at the driver’s distraction.

After the video showing such was released, media attention focused on the distracted-driving angle of the crash. It also brought up the background of the driver, who had a number of violations behind him. Certainly the issue of electronics and technology detracting from safe driving is a hot topic and something, unlike self-driving Uber vehicles, that most of us encounter in our everyday lives. But I wonder if this exposes a fundamental flaw in the self-driving technology?

It’s not exactly analogous to my snow situation above, but I think the core question is the same. The current implementation of the self-driving car technology augments the human driver rather than replaces him or her. In doing so, however, it also removes some of the responsibility from the driver as well as making him more complacent about the dangers that he may be about to encounter. The more that the car does for the driver, the greater the risk that the driver will allow his attention to wander rather than stay focused, on the assumption that the autonomous system has him covered. In the longer term, are there aspects of driving that the driver will not only stop paying attention to, but lose the ability to manage in the way a driver of a non-automated car once did?

Naturally, all of this can be designed into the self-driving system itself. Even if a car is capable of, essentially, driving itself over a long stretch of a highway, it could be designed to engage the driver every so many seconds. Essentially, requiring unnecessary input from the operator can be used to make sure she is ready to actively control the car if needed. I note that we aren’t breaking new ground here. A modern aircraft can virtually fly itself, and yet some part of the design (plus operational procedures) are surely in place to make sure that the pilots are ready when needed.

As I said, the governmental response has been to halt the program. In general, it will be the governmental response that will be the biggest hurdle for self-driving car technology.

In the specific case of Arizona, I’m not actually trying to second guess their decision. Presumably, they set up a legal framework for the testing of self-driving technology on the public roadways. If the accident in question exceeded any parameters of that legal framework, then the proper response would be to suspend the testing program. On the other hand, it may be that the testing framework had no contingencies built into it, in which case any injuries or fatalities would have to be evaluated as they happen. If so, a reactionary legal response may not be productive.

I think, going forward, there is going to be a political expectation that self-driving technology should be flawless. Or, at least, perfect enough that it will never cause a fatality. Never mind that there are 30-40,000 motor vehicle deaths per year in the United States and over a million per year world wide. It won’t be enough that an autonomous vehicle is safer than a non-autonomous vehicle; it will have to be orders-of-magnitude safer. Take, as an example, passenger airline travel. Despite a rate that is probably about 10X safer for aircraft over cars, the regulatory environment for aircraft is much more stringent. Take away the “human” pilot (or driver) and I predict the requirements for safety will be much higher than for aviation.

Where I’m headed in all this is, I suppose, to answer the question about when we will see self-driving cars. It is tempting to see that as a technological question – when will the technology be mature enough to be sold to consumers? But it is more than that.

I recall seeing somewhere an example of “artificial intelligence” for a vehicle system. The example was of a system that detected a ball rolling across the street being a trigger for logic that anticipates there might be a child chasing that ball. A good example of an important problem to solve before putting an autonomous car onto a residential street. Otherwise, one child run down while he was chasing his ball might be enough for a regulatory shutdown. But how about the other side of that coin? What happens the first time a car swerves to avoid a non-existent child and hits an entirely-existent parked car? Might that cause a regulatory shutdown too?

Is regulatory shutdown inevitable?

Robo-Soldiers

At roughly the same time that the self-driving car fatalities were in the news, there was another announcement, even more closely related to my previous post. Video-game developer EA posted a video showing the results of a multi-disciplinary effort to train an AI player for their Battlefield 1 game (which, despite the name, is actually the fifth version of the Battlefield series). The narrative for this demo is similar to that of Google’s (DeepMind) chess program. The training was created, as the marketing pitch says, “from scratch using only trial and error.” Without viewing it, it would seem to run counter to my previous conclusions, when I figured that the supposed generic, self-taught AI was perhaps considerably less than it appeared.

Under closer examination, however, even the minute-and-a-half of demo video does not quite measure up to the headline hype, the assertion that neural nets have learned to play Battlefield, essentially, on their own. The video explains that the training methods involve manually placing rewards throughout the map to try to direct the behavior of the agent-controlled soldiers.

The time frame for a project like this one would seem to preclude them being directly inspired by DeepMind’s published results for chess. Indeed, the EA Technical Director explains that it was earlier DeepMind work with Atari games that first motivated them to apply the technology to Battlefield. Whereas the chess example demonstrated an ability to play chess at a world class level, the EA project demonstration merely shows that the AI agents grasp the basics of game play and not much more. The team’s near-term aspirations are limited; use of AI for quality testing is named as an expected benefit of this project. He does go so far as to speculate that a few years out, the technology might be able to compete with human players within certain parameters. Once again, a far cry from a self-learning intelligence poised to take over the world.

Even still, the video demonstration offers a disclaimer. “EA uses AI techniques for entertainment purposes only. The AI discussed in this presentation is designed for use within video games, and cannot operate in the real world.”

Sounds like they wanted to nip any AI overlord talk in the bud.

From what I’ve seen of the Battlefield information, it is results only. There is no discussion of the methods used to create training data sets and design the neural network. Also absent is any information on how much effort was put into constructing this system that can learn “on its own.” I have a strong sense that it was a massive undertaking, but no data to back that up. When that process becomes automated (or even part of the self-evolution of a deep neural network), so that one can quickly go from a data set to a trained network (quickly in developer time, as opposed to computing time), the promise of the “generic intelligence” could start to materialize.

So, no, I’m not made nervous that an artificial intelligence is learning how to fight small unit actions. On the other hand, I am surprised at how quickly techniques seem to be spreading. Pleasantly surprised, I should add.

While the DeepMind program isn’t open for inspection, some of the fundamental tools are publicly available. As of late 2015, the Google library TensorFlow is available in open source. As of February this year, Google is making available (still in beta, as far as I know) their Tensor Processing Unit (TPU) as a cloud service. Among the higher-profile uses of TensorFlow is the app DeepFake, which allows its users to swap faces in video. A demonstration compares the app’s performance, using a standard desktop PC and about a half-an-hour’s training time to produce something comparable to Industrial Light and Magic’s spooky-looking Princess Leia reconstruction.

Meanwhile, Facebook also has a project inspired by DeepMind’s earlier Go neural network system. In a challenge to Google’s secrecy, the Facebook project has been made completely open source allowing for complete inspection and participation in its experiments. Facebook announced results, at the beginning of May, of a 14-0 record of their AI bot against top-ranked Go players.

Competition and massive-online participation is bound to move this technology forward very rapidly.

The future’s uncertain and the end is always near.

*To be sure, I learned a few of those lessons the hard way, but that’s a tale for another day.

ABC Easy as 42

Teacher’s gonna show you how to get an ‘A’

In 1989, IBM hired a team of programmers out of Carnegie Mellon University. As part of his graduate program, team leader Feng-hsiung Hsu (aka Crazy Bird) developed a system for computerized chess playing that the team called Deep Thought. Deep Thought, the (albeit fictional) original, was the computer created in Douglas Adam’s The Hitchhiker’s Guide to the Galaxy to compute the answer for Life, the Universe, and Everything. It was successful in determining the answer was “42,” although it remained unknown what the question was. CMU’s Deep Thought, less ambitiously, was a custom designed hardware-and-software solution for solving the problem of optimal chess playing.

Once at IBM, the project was renamed Deep Blue, with the “Blue” being a reference to IBM’s nickname of “Big Blue.”

On February 10th, 1996, Deep Blue won its first game against a chess World Champion, defeating Garry Kasparov. Kasparov would go on to win the match, but the inevitability of AI superiority was established.

Today, computer programs being able to defeat humans is no longer in question. While the game of chess may never be solved (à la checkers), it is understood that the best computer programs are superior players to the best human beings. Within the chess world, computer programs only make news for things like when top players may be using programs to gain an unfair advantage in tournament play.

Nevertheless, a chess-playing computer was in the news late last year. Headlines reported that a chess playing algorithm based on neural networks, starting only from the rules of legal chess moves, in four hours created a program that could beat any human and nearly all top-ranked chess programs. The articles spread across the internet through various media outlets, each summary featuring their own set of distortions and simplifications. In particular, writers that had been pushing articles about the impending loss of jobs to AI and robots jumped on this as proof that the end had come. Fortunately, most linked to the original paper rather than trying to decipher the details.

Like most I found this to be pretty intriguing news. Unfortunately, I also happen to know a little (just a little, really) about neural networks, and didn’t even bother to read the whole paper before I started trying to figure out what had happened.

Some more background on this project. It was created at DeepMind, a subsidiary of Alphabet, Inc. This entity, formerly known simply as Google, reformed itself in the summer of 2015 with the new Google being one of many children of the Alphabet parent. Initial information suggested to me an attempt at creating one held company for each letter of the alphabet, but time has shown that isn’t their direction. As of today, while there are many letters still open, several have multiple entries. Oh well, it sounded more fun my way. While naming a company “Alphabet” seems a bit uninspired, there is a certain logic to removing the name Google from the parent entity. No longer does one have to wonder why an internet company is developing self-driving cars.

Google’s self-driving car?

The last time the world had an Artificial Intelligence craze was in the 1980s into the early 1990s. Neural networks were one of the popular machine intelligence techniques of that time too. At first they seemed to offer the promise of a true intelligence; simply mimicking the structure of a biological brain could produce an ability to generalize intelligence, without people to craft that intelligence in code. It was a program that could essentially teach itself. The applications for such systems seemed boundless.

Unfortunately, the optimism was quickly quashed. Neural networks had a number of flaws. First, they required huge amounts of “training” data. Neural nets work by finding relationships within data, but that source data has to be voluminous and it has to be suited to teaching the neural network. The inputs had to be properly chosen, so as to work well with the networks’ manipulation of those data and the data themselves had to be properly representative of the space being modeled. Furthermore, significant preprocessing was required from the person organizing the training. Additional inputs would result in exponential increases in both the training data requirement and the amount of processing time to run through the training.

It is worthwhile to recall the computer power available to neural net programmers of that time. Even a high-end server of 35 years ago is probably put to shame by the Xbox plugged into your television. Furthermore, the Xbox is better suited to the problem. The mathematics capability of Graphical Processing Units (GPUs) is a more efficient design for solving these kinds of matrix problems. Just like Bitcoin mining, it is the GPU on a computer that is going to best be able to handle neural network training.

To illustrate, let me briefly consider a “typical” neural network application of the previous generation. One use is something called a “soft sensor.” Another innovation of that same time was the rapid expansion in capabilities of higher-level control systems for industrial processes. For example, some kind of factory-wide system could collect real-time data (temperatures, pressures, motor speeds – whatever is important) and present them in an organized fashion to give an overview of plant performance and, in some cases, automate overall plant control. For many systems however, the full picture wasn’t always available in real time.

Let’s imagine the production of a product which has a specification limiting the amount of some impurity. Largely, we know what the right operating parameters of the system are (temperatures, pressures, etc) but to actually measure for impurities, we manually draw a sample, send it off to a lab for testing, and wait a day or two for the result. It would stand to reason that, in order to keep your product within spec, you must operate far enough away from the threshold that if it begins to drift, you would usually have time to catch it before it goes out of spec. Not only does that mean you’re, most of the time, producing a product that exceeds specification (presumably at extra cost), but if the process ever moves faster than expected, you may have to trash a day’s worth of production created while you were waiting for lab results.

Enter the neural network and that soft sensor. We can create a database of the data that were collected in real time and correlate those data with the matching sample analyses that were available afterward. Then a neural network can be trained using the real-time measurements as input to produce an output predicting sample measurement. Assuming that the lab measurement is deducible from the on-line data, you now have in your automated control system (or even just as a presentation to the operators) a real time “measurement” of data that otherwise won’t be available until much later. Armed with that extra knowledge, you would expect to both cut operating costs (by operating tighter to specification) and prevent waste (by avoiding out-of-spec conditions before they happen).

That sounds very impressive, but I did use the word “assuming.” There were a lot of factors that had to come together before determining that a particular problem was solvable with neural networks. Obviously, the result you are trying to predict has to, indeed, be predictable from the data that you have. What this meant in practice is that implementing neural networks was much bigger than just the software project. It often meant redesigning your system to, for example, collect data on aspects of your operation that were never necessary for control, but are necessary for the predictive functioning of the neural net. You also need lots and lots of data. Operations that collected data slowly or inconsistently might not be capable of providing a data set suitable for training. Another gotcha was that collecting data from a system in operation probably meant that said system was already being controlled. Therefore, a neural net could just as easily be learning how your control system works, rather than the underlying fundamentals of your process. In fact, if your control reactions were consistent, that might be a much easier thing for the neural net to learn than the more subtle and variable physical process.

The result was that many applications weren’t suitable for neural networks and others required a lot of prep-work. Projects might begin with redesigning the data collection system to get more and better data. Good data sets in hand, one now was forced into time-intensive data analysis which was necessary to ensure a good training set. For example, it was often useful to pre-analyze the inputs to eliminate any dependent variables. Now, technically, that’s part of what the neural network should be good at – extracting the core dependencies from a complex system. However, the amount of effort – in data collected and training time – increases exponentially when you add inputs and hidden nodes, so simplifying a problem was well worth the effort. While it might seem like you can always just collect more data, remember that the data needed to be representative of the domain space. For example, if the condition that results in your process wandering off-spec only occurs once every three or four months, then doubling your complexity might mean (depending on your luck) increasing the data collection from a month or two to over a year.

Hopefully you’ll excuse my trip down a neural net memory lane, but I wanted to set your expectations of neural network technology where mine were, because the state of the art is very different than what it was. We’ve probably all seen some of the results with image recognition that seems to be one of the hottest topics in neural networks these days.

So back to when I read the article. My first thought was to think in terms of the neural network technology as I was familiar with it.

My starting point to design my own chess neural net has to be representations of the board layout. If you know chess, you probably have a pretty good idea how to describe a chess board. You can describe each piece using a pretty concise terminology. In this case, I figure it is irrelevant where a piece has been. So whether it started as a king’s knight’s pawn or a queen’s rook’s pawn, that doesn’t effect its performance. So you have 6 possible piece descriptors which need to be placed into the 64 squares that they could possibly reside upon. So, for example, imagine that I’m going to assign an integer to the pieces, and then use positive for white and negative for black:

Pawn	Knight	Bishop	Rook	King	Queen
1	2	3	4	5	6

My board might look something like this 4,2,3,6,5,3,2,4,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0…-3,-2,-4

If I am still living in the 90s, I’m immediately going to be worried about the amount of data, and might wonder if I can compress the representation of my board based on assumptions about the starting positions. I’ve got all those zeros in the center of my matrix, and as the game progresses, I’m going to be getting fewer data and more zeros. Sixty-four inputs seems like a lot (double that to get current position and post-move position), and I might hope to winnow that down some manageable figure with the kind of efforts that I talked about above.

If I hadn’t realized my problem already, I’d start to figure it out now. Neural networks like inputs to be proportional. Obviously, binary inputs are good – something either affects the prediction or doesn’t. But for variable inputs, the variation must make sense in terms of the problem you are solving. Using the power output of a pump as an input to a neural network makes sense. Using the model number of that pump, as an integer, wouldn’t make sense unless there is a happenstantial relationship between the serial number and some function meaningful to your process. Going back to my board description above, I could theoretically describe the “power” of my piece with a number between 1 and 10 (as an example), but any errors in my ability to accurately rank my pieces contribute to prediction errors. So is a Queen worth six times a pawn or nine? Get that wrong, and my neural net training has an inaccuracy built in right up front. And, by the way, that means “worth” to the neural net, not to me or other human players.

A much better way to represent a chess game to a mathematical “intelligence” is to describe the pieces. So, for example, each piece could be described with two inputs, describing its deviation from that piece’s starting position in the X and Y axes, with perhaps a third node to indicate whether the piece is on the board or captured. My starting board then becomes, by definition, 96 zeros, with numbers being populated (and generally growing) as the pieces move. It’s not terribly bigger (although rather horrifyingly so to my 90s self) than the representation by board, and I could easily get them on par by saying, for example, that pieces captured are moved elsewhere on the board, but well out of the 8X8 grid. Organizing by the pieces, though, is both non-intuitive for we human chess players and, in general, would seem less efficient in generalizing to other games. For example, if I’m modelling a card game (as I talked about in my previous post), describing every card, and each of their possible positions; that is a much bigger data set than just describing what is in each hand and on the table. But, again, it should be clear that the description of the board is going to be considerably less meaningful as a mathematical entity than the description created by working from each game piece.

At this point, it is worth remembering again that this is no longer 1992. I briefly mentioned the advances in computing, both in power and in structure (the GPU architecture as superior for solving matrix math). That, in turn, has advanced the state of the art in neural network design and training. The combination goes a long way in explaining why image recognition is once again eyed as a problem for neural networks to address.

Consider the typical image. It is a huge number of pixels of (usually) highly-compressible data. But compressing the data will, as described above, befuddle the neural network. On the other hand, those huge, sparse matrices need representative training data to evenly cover the huge number of inputs, with that need increasing geometrically. It can quickly become, simply, too much of a problem to solve in a timely manner no matter what kind of computing power you’ve got to throw at it. But with that power, you can do new and interesting things. A solution for image recognition is to use “convolutional” networks.

Not to try to be too technically correct, I’ll try to capture the essence of this technique. The idea is that the input space can be broken up into sub-spaces (in an image, small fractions of the image), that then feed a significantly smaller neural network. Then, one might assume that those small networks are all the same or similar to each other. For image recognition, we might train 100s or even 1000s of networks operating on 1% of the image (in overlapping segments), creating a (relatively) small output based on the large number of pixels. Then those outputs feed a whole-image network. It is still a massive computational problem, but immensely smaller than the problem of training a network processing the entire image as the input.

Does that make a chess problem solvable? It should help, especially if you have multiple convolutional layers. So there might be a neural network that describes each piece (6 inputs for old/new position (2D) plus on/off board) and reduces it to maybe 3 outputs. A second could map similar pieces.. where are the bishops? where are the pawns? Another sub-network, repeated twice, could try just looking at one player at a time. It is still a huge problem, but I can see that it’s something that is becoming solvable given some time and effort.

Of course, this is Alphabet, Inc we are talking about. They’ve got endless supplies of (computing) time and (employee) effort, so if it is starting to look doable to a mere human like me, it is certainly doable for them.

At this point, I went back to the research paper wherein I discovered that some of my intuition was right, although I didn’t fully appreciate that last point. Just as a simple example, the input layer for the DeepMind system is to represent each piece as a board showing the position of the piece. So 32X a 64-by-64 positional grid. They also use a history of turns, not just current and next turn. It is orders-of-magnitude more data than I anticipated, but in extremely sparse data sets. In fact, it looks very much like image processing, but with much more ordered images (to a computer mind, at least). The paper states they are using Tensor Processing Units, a Google concoction meant to use hardware having similar advantages to the GPU and it’s matrix-math specialization, but further optimized specifically to solve this kind of neural network training problem.

So lets finally go back to the claim that got all those singularity-is-nigh dreams dancing in the heads of internet commentators. The DeepMind team were able to train in a matter of (really) twenty-four hours a superhuman level chess player with no a priori chess knowledge. Further, the paper states that the training set consists of 800 randomly-generated games (constrained only to be made up of legal moves), which seems like an incredibly small data set. Even realizing how big those representations are (with their sparse descriptions of the piece locations as well as per-piece historical information), it all sounds awfully impressive. Of course, that is 800 games per iteration. If I’m reading right, that might be 700k iterations in over 9 hours using hardware nearly inconceivable to us mortals.

And that’s just the end result of a research project that took how long? To get to that point where they could hit the “run” button took certainly months, and probably years.

First you’ve got to come up with the data format, and the ability to generate games in that format. Surprisingly, the paper says that the exact representation wasn’t a significant factor. I suppose that it is an advantage of its sparseness. Next, you’ve got to architect that neural net. How many convolutions over what subsets? How many layers? How many nodes? That’s a huge research project, and one that is going to need huge amounts of data – not the 800 randomly generated games you used at the end of it all.

The end result of all this – after a process involving a huge number of PhD hours and petaFLOPS of computational power – you’ve created a brain that can do one thing; learn about chess games. Yes, it is a brain without any knowledge in it – a tabula rasa – but it is a brain that is absolutely useless if provided knowledge about anything other than playing chess.

It’s still a fabulous achievement, no doubt. It is also research that is going to be useful to any number of AI learning projects going forward. But what it isn’t is any kind of demonstration that computers can out-perform people (or even mice, for that matter) in generic learning applications. It isn’t a demonstration that neural nets are being advanced into the area of general learning. This is not an Artificial Intelligence that could be, essentially, self-teaching and therefore life-like in terms of its capabilities.

And, just to let the press know, it isn’t the end of the world.

A Plague of Frogs

Veni, Vidi, Ranae Cecinere

Tag Archives: automobiles