A.I. historians (who knew such a thing existed?) love to talk about the A.I. winters — periods of time when research, development, and funding in the field of A.I. were significantly diminished and public interest in the topic faded considerably.
Depending on who you talk to, the most recent A.I. winter lasted well into the mid 2000s. But as the 2010s rolled around, advancements in computing power and breakthroughs from academic research led to a massive resurgence in the field, specifically around the area of machine learning.
Massively improved computing power, combined with many old, and some new, techniques and algorithms, catalyzed the use of deep learning in industry, enabling brand new features in existing products, brand new products in existing companies, and brand new companies from existing ideas.
Some folks argue that we may now be seeing the limits of deep learning, and that we may be approaching a new A.I. winter. I don’t see that happening. The advancement of deep learning has actually opened up entirely new areas of research and development around A.I.. Many companies, both large and small, are now seeing how they can apply advanced machine learning techniques to solve their business problems, and while we know that deep learning has its limitations, and can certainly fall short in certain respects, we now have more people than ever working on improving those shortcomings and working to use deep learning to solve their own problems.
In my opinion, there are a few extremely interesting areas of research that can lead to improvements in A.I. that might have an impact as large as that of deep learning. In this post, I wanted to touch on a few of them, highlighting why I think they’re exciting and where (or if) I think they can be applied to business problems today.
In my view, the most interesting areas of research are primarily focused in two key areas:
From what I can tell, data scientists that are working on improving deep learning are focused on two key areas:
Using deep learning algorithms isn’t always easy. It can frequently require extremely large datasets in order to properly train a network. It also requires significant time and effort to come up with an optimized network. And finally, predictions made by deep networks can be hard to explain and contextualize.
Training a deep network from scratch can require large amounts of data. While we are currently living in the age of big data, not every company or every problem lends itself to the aggregation of massive datasets. Creating a deep network with smaller datasets is an active area of research, and one of the areas that I think can make the biggest impact here is transfer learning.
Transfer learning is a process whereby a neural network is trained for some task on a large amount of data, then the base layers of that network are separated from the final layers and used as the foundation of a new network that’s trained on a smaller dataset. The core idea here is that some tasks may be related (for example, the task of object recognition within an image is similar to identifying key information from a photo of a check or license plate), and that the foundational layers of a neural network trained on a task where a large amount of data exists can be useful for a similar task where there is a lack of data (for example, all tasks that require computer vision will require a network to first identify simple shapes like edges and lines that will then be used to identify shapes and objects).
If transfer learning works well, it should be possible to not only get very accurate networks for common problems using smaller datasets, but it could also lead to the advent of more generalizable networks that can be applied to a variety of tangentially related problems.
Deep learning also suffers from the problem of explainability — or identifying the key factors that led to a neural network’s prediction. While neural networks aren’t the only machine learning algorithm that suffer from this “black box” stigma (ensembles like bagging and boosting also come to mind), they are the quintessential example used to showcase the lack of transparency that comes from advanced machine learning algorithms.
While I’m a believer in the idea that the world will eventually (though not soon) move beyond the need to have full and complete insight into the output of a trained model, I do think that there are enough industries, regulations, and a sufficient level of human curiosity and skepticism today that we need a way to gain additional insight into the output of a trained model.
Today, the most interesting areas of research in this field are techniques that work to identify the key factors that go into an individual prediction (as opposed to an entire model). Some recent tools that I’ve seen emerge here (within the world of Python) include LIME and SHAP.
At some point, it may be possible to create new techniques to gain a broader understanding of an entire model (I’ve wondered if it would be possible to have a neural network trained to output a textual description of how it works), but for the time being, it seems like that will remain an area of research.
Another problem, albeit perhaps a less impactful one, is the need for data scientists and engineers to spend time optimizing their networks (or other ML models) to their exact problem space. This can take a lot of time and effort. The idea behind automatic machine learning is that it may now be possible to have models that either:
With a really good auto ML suite, it may be possible for both technical and non-technical people to create optimized models that are suitable for their individual tasks.
Finally, the last area of deep learning that I’m excited to track lies in the ability for deep networks to create their own content or dramatically improve how they interpret existing content.
GANs, or generative adversarial models, are now being used to generate artwork, images, text, and audio, and may some day be responsible for producing content and entertainment that only humans can produce today (yes, I know, a bit of hyperbole here, but if we look far enough down the line I think this could be possible).
Transformer networks, which I admittedly know less about, might be used to create standard contracts, documents, and articles that are actually interesting and relevant, rather than formulaic and robotic.
While I’m excited to see how deep networks improve in the coming future, I’m truly enthusiastic to see how reinforcement learning progresses. This form of machine learning involves machines actually learning how to make decisions to optimize for long-term consequences in the face of uncertainty.
In the past few years, research into reinforcement learning has primarily been in the area of video games, however, by its very nature, if reinforcement learning really takes off, I think it can be applied to any number of industries that require optimal decision-making in the face of short-term variability.
One of the key problems in the field of reinforcement learning lies in the need for humans to create an external “reward system” that provides a feedback mechanism for algorithms to identify and learn desired behaviors. While, at first glance, this may not seem like a major problem, the need for humans to create reward functions can lead learning algorithms to local maxima, biased behaviors, and an unexplored solution space.
With curiosity learning, agents in a reinforcement learning environment are incentivized to explore new situations and scenarios, which increases the likelihood that they will eventually discover highly desirable behaviors without being limited to a narrower space that would be dictated by extrinsic rewards. The use of curiosity learning might also lead to agents being driven by their own intrinsic reward system, which would take the onus of creating and determining rewards (some of which can be subjective), off of human developers.
While I hesitate to say that this sort of behavior begins to mimic human behavior, I do think that if we can incentivize algorithms to get a general, high-level “understanding” of their environments by “scoping things out” they will perform better.
Finally, one area of research into reinforcement learning that I think could be a complete game changer lies in the use of models that assist agents in taking actions within their environments.
Today, it seems like a large amount of research conducted in the field of machine learning lies in the field of model-free algorithms. However I’ve recently come across some interesting research around the idea of allowing agents to learn a model of their environment in an unsupervised manner, which they then use to make their decisions when learning optimal behavior.
I have very little knowledge in this area, but the concept is exciting to me because it begins to mimic how humans operate. People operate within models of the world, and are generally able to make good (if not optimal) decisions within the models that they create. If we can begin to imbue some of those capabilities into machines, which can then use their computing power to make more optimized decisions than we can, it could enable a “best of both worlds” outcome, whereby machines are able to distill large amounts of information down to the most important aspects, and are then able to make very fast and optimal decisions within the models they create.
While I think the research being conducted today is fascinating and has enormous potential to further the field of machine learning as we know it, I still think we’re very far from a world of A.I. as most people think about it. I believe that for us to achieve artificial general intelligence, commonly abbreviated AGI, we’re going to somehow have to get machines to understand and execute common sense reasoning. They’ll have to learn to conceptualize and abstract at a very large scale. They’ll need to learn models of how many different aspects of the world work.
We still don’t really know how we could do this. While there are those out there that think we’re close, I suspect that this will prove to be an incredibly difficult thing to do, and will require areas of research that are completely separate from the mathematically and computationally-inclined areas of research that we see today.
Humans are still unique and capable in ways that we won’t be able to replicate for a long time. But while we try, we can begin to take advantage of a variety of new techniques that will enable us to build tools and technologies that will help us improve the state of our lives in the near future.
Shanif Dhanani is the co-founder & CEO of Apteo. Prior to Apteo, Shanif was a data scientist and software engineer at Twitter, and prior to that he was the lead engineer and head of analytics at TapCommerce, a NYC-based ad tech startup acquired by Twitter. He has a passion for all things data and analytics, loves adventure traveling, and generally loves living in New York City.