Beyond the Hype of Big Data: An interview with Abraham Thomas of Quandl
12 April 2017
By Tiffany Regaudie
When you buy a book from Amazon, who knows about the purchase?
You know about it and Amazon knows about it. Your credit card company knows about it because they see the charge and so does your bank. Your email provider knows about it because they see that Amazon has sent you a receipt. When the book is shipped, the shipment company knows about it. A GPS unit in the delivery truck knows about it for tracking purposes. Traffic monitors know about it as they watch the delivery truck drive along the highway. A satellite in space is also watching the truck as it arrives at your door. Perhaps there’s a security camera in your lobby that also knows about it as it captures the package being handed to the concierge.
We leave a trail of data across several databases with every decision we make. While this data is becoming increasingly valuable to investors, it can also be difficult to consolidate, interpret and act on. But Abraham Thomas insists this is a solvable problem and it’s one he’s been focusing on since he co-founded Quandl, “the marketplace for financial, economic and alternative data.”
I sat down with Abraham ahead of our panel discussion on April 20, where we aim to get “beyond the hype” and shed some light on the status of predictive analytics and big data — and ultimately their effect on investor relations and capital markets.
Tiffany Regaudie, Q4: I read that you were the youngest trader at a multi-billion dollar hedge fund. Tell me about what that was like and how you transitioned to become one of Canada’s thought leaders on big data.
Abraham Thomas, Quandl: I got my start in the industry in the mid-1990s. It was around the time that the whole field of quantitative finance was just getting off the ground. Before that a lot of investing was based on fundamental analysis and gut feeling. Some folks are very good at investing based on instinct but that’s not necessarily scalable or repeatable. So when people discovered that they could apply mathematical models to the financial market, it was quite a revolution. A whole bunch of mathematicians and physicists and engineers got recruited as part of that wave. I have an engineering background and got my first job programming mathematical models for a hedge fund. I then became an analyst, then a trader, then a portfolio manager. I had a good run but I decided to return to my engineering roots when I founded Quandl.
TR: We have so much data at our fingertips now; a large problem seems to be recognizing what’s good data and what’s bad data. How do you distinguish the good from the bad?
AT: When I look at a dataset, the first question I want to answer is, is it practically usable? By which I mean, is it clean and accurate? Is it well documented? Do I know what each field means? Does it have gaps or errors? Is it complete and consistent? Is it fresh and always up-to-date? Can I get it into the tools I use in my workflow? These are table stakes for usability; without these, no matter how interesting the data is, I can’t get value from it. Once I check off all these boxes, I proceed to evaluate the content. There are different ways in which content can be valuable: some datasets are predictive of future economic events; others correlate with asset prices or financial indicators; others are useful for benchmarking or historical evaluation; still others provide fundamental or qualitative insight. There’s a lot of data out there and the truth is the vast majority of it simply does not offer that kind of insight. Finally, I consider factors like uniqueness, exclusivity and competitive advantage. If I have data that my competitors do not have, that’s an edge for me and that makes the data valuable. Usability, content value and competitive advantage: these are my three main criteria for evaluating datasets.
TR: I keep hearing the interpretation that alternative data is an art and a science. Do you subscribe to this? Are you more of the scientist, the artist, or both?
AT: I do subscribe to that statement. When we talk about alternative data, we’re talking about datasets that have not been traditionally used by the finance industry but which nonetheless have predictive value for the market. The science part is straightforward and merely a mathematical problem: Finding out if there’s a statistically significant correlation between dataset X and economic indicator Y. But because the world is exploding with data, you may often find something that seems predictive but may simply be a coincidence. For instance, someone discovered a correlation between the performance of the S&P 500 with the production of butter in Bangladesh. If you superimpose one graph over the other, they match. This is interesting but would you really invest your hard-earned dollars on that correlation? Probably not. That’s where the art comes in. You need to intuitively know how the economy works and the pieces fit together.
TR: Do you think that machines are becoming better at this intuitive portion of data analysis?
AT: I don’t think that machines are there yet. I’ve worked with AI and machine learning systems, and they are very good at finding patterns — but there is still some ground to go before they can reliably make “intuitive” predictions. The nature of the question itself, however, might become irrelevant over time. Asking whether machines will become more intuitive is like asking whether a submarine knows how to swim. AI and ML will solve problems not by developing human-style economic intuition, but rather by processing ever more more data. That will help improve its ability to find patterns without ever needing to develop that same human artistry.
TR: This leads me to my next question, which gets to the heart of what our panel discussion will be about. What’s the most important thing people should know about AI’s current status for the financial markets, beyond the hype?
AT: In the past, a skilled financial analyst could model in a spreadsheet or even in their head all the variables needed to predict a company’s stock price behaviour: revenue, profits, past price history, major holders, short positions, etc. But with the explosion of alternative data that is shown to have a correlation to the financial markets, there’s just so much information that could affect the behaviour of a company that it’s impossible for a human being to track all of it. The value that AI and machine learning bring to the table is the simple ability to digest all that data and find signals within the flood — this would be impossible for a human brain to do. As financial markets become more competitive and datasets grow in size, the interpretation of data will become crucial to remaining competitive. But it’s not necessarily about speed anymore — the market is so efficient now that we are reaching a limit on the value of faster data analysis. The next revolution will be AI’s ability to spot patterns the human mind can’t dream of processing and thereby finding new ways of interpreting data for the financial markets.
Financial markets have always been in an arms race — AI now offers a new frontier in that race as data fuels the technology needed for it to grow.