I’m a serial tech entrepreneur with Wall Street roots. I’m also in my 28th year of teaching mathematical finance and computing in the masters in mathematical finance program at NYU’s Courant Institute of Mathematical Sciences. My Algorithmic Trading and Quantitative Strategies course, which I developed with Petter Kolm, is in its 18th year in Courant’s course offerings.
I was one of three founding partners of Pragma Financial Systems, an algorithmic trading boutique that keeps winning hi tech Wall Street praise (like this and this). Among its many clients are hedge funds, banks, pension managers and the New York Stock Exchange. Pragma was recently acquired by Market Access.
Before that, I was a quant developer on the Bear Stearns derivatives desk and then a statistical trader at Mint Investment Management. Mint was one of the first systematic mega funds and, at the time, one of the largest commodity trading advisors in the world by assets under management. (For good trading lore, check out Jack Schwager’s interview of Mint’s founding partner, Larry Hite in the book Market Wizards.)
In 1999, I co-founded ThoughtWheel, a technology company that developed a novel knowledge management tool. (My patent for which can be found here.)
In 2002, my partners, David Mechner, Francis Mechner, and I founded Pragma Financial Systems, which eventually became Pragma Securities. After our first meeting, David came to one of my lectures on market microstructure. Those were the early days of algorithmic trading and few people had heard of the Almgren-Chriss optimization much less the more advanced form of dynamic portfolio trading that we wanted to offer to our clients. (Here is a paper by my colleagues at Courant, Petter Kolm and Gordon Ritter describing a Bayesian approach to the problem.)
I served as Pragma’s Director of Research until 2008, when I was replaced by Dr. Peter Fraenkel, the same Peter Fraenkel with whom I co-taught my first class at Courant. While at Pragma, I also brought in a former student, Dr. Eran Fishler, who later took over for Peter, and is currently Pragma’s architect in chief.
Since my departure from Pragma, I have spent my time developing tech startups, some that came entirely from my own ideas, and some in which I hold – or once held and subsequently sold – equity interest. Not all of these startups are in finance but all are influenced by my work on Wall Street studying the universal nature of markets.
Markets are everywhere. You might not think dating is anything like a financial market, but consider the following. In the financial markets, the initiator of a trade – the first to indicate interest or take action – is penalized. Is that also true of the market for romance? Check out this paper! And this one. Models of market microstructure predict that the initiator of the trade – the seeker – is penalized in all markets, including the market for romance. It’s a direct consequence of aversion to risk and a perception of information asymmetry.
I’m not just the frontman for my ideas. I like to get my hands dirty. I code in Java, Python, and R, to name a few. I like data science and have some favorite theories about the right way to run a modeling project. Some might not agree but here’s a little taste of my thinking.
This paper by Andrew Ng and others shows that linear models with non-linear features perform as well as the most sophisticated deep neural nets, provided the number of features is very large. In other words, we can linearize non-linear models using a linear learner and a very large number of non-linear features.
Why should we believe that simple models can perform as well as the most sophisticated? As Pedro Domingos explains in this paper, all models learned by gradient descent are approximately kernel machines, which is to say that the choice of model matters less than the quality of data and number of features. The benign overfitting revolution gives us permission to use models with many more features than statistics previously assumed was safe, and that’s all we need.
But what about the inherent non-linearity of deep learning tools? Isn’t there a benefit that goes beyond simple linear models? Surprisingly, the answer is no. Analyses of trained deep learning models – well summarized in this blog post – show that they are almost linear! In the list of evidence is the fact that two fully trained deep learning models can be linearly combined for an improvement in performance.
More generally, maybe we’ve been looking at this the wrong way from the very beginning. Recent work supports the view that brains – the oft-cited but apparently incorrect inspiration for deep learning models – are not very hierarchical. Brain architecture is shallow and wide. Why? Maybe it’s harder to evolve non-linear structures. Their success might come to depend crucially on fine tuning a small number of very powerful parameters. It may be easier to just keep adding more features.
Studies of human decision making show that we consider many more factors than would seem to be relevant. Our choice of coffee marginally influences which car we buy! As Daniel Dennett explained, brains are messy and parallel.
Ok, enough. Here’s a little about my other interests.
I read a lot. I spend time with my 6 year old son and 11 year old daughter. I love moral philosophy. I used to host a philosophy dinner in which my friends and I would cook for my guests. We tried to answer some weighty questions. I like the thorny ones, the ones that compel the thin skinned to stomp out. It has happened! One topic temporarily derailed two separate romantic relationships (thankfully, at the time, not my own).
I box. I earned my judo black belt from the incomparable Teimoc Johnston Ono. I practice jiu jitsu. I take hip hop lessons. I commute on an electric unicycle.
–Lee
lee.maclin@gmail.com