We've put together some basics to help you get started
How did the company start?
In 2017, Manan Shah and Shanif Dhanani, the co-founders ofApteo, came together under a common belief that technology and machine learning could improve investing. They created the company with the idea that they could use the latest techniques in A.I. and machine learning to replicate the workflow of a Wall Street Street analyst. Subsequently, they built a large A.I. engine that took in millions of data points from thousands of different sources, and used it to provide stock rankings through a B2C website named Milton. As they started to sell the data from Milton, they realized that finance professionals were facing their own problems in sourcing, managing, and analyzing data, so they pivoted the company and began building OneData, a data science platform for everyone.
Who uses OneData?
Sell-side banks and asset managers love OneData for data analysis and discovery, while corporates, venture capital, private equity, and insurance companies use the platform to centralize key datasets and collaboration.
How big is your team?
As of January 2020, Apteo employs 11 people, many of whom have experience working at notable companies such as Twitter, DataDog, Thinknum, and Point72.
What products do you offer?
OneData, our data science platform, helps financial firms source, centralize, and analyze data. The platform provides a data catalog that centralizes and categorizes a firm’s paid and proprietary data alongside more than 2 million public datasets that we bring to the table, and an analytics module that provides historical correlative analytics and forward-looking predictive analytics in an intuitive, point-and-click interface. The platform is available as a freemium product on the cloud, or as an enterprise-wide custom deployment. API access is available for querying and retrieving data.
How do you source your data?
We provide more than two million public datasets sourced from federal, statutory, and municipal governments, non-profit organizations, and various other sources that make their data available to the general public. We also provide premium datasets, which we either generate or validate and verify ourselves. We do not provide web-scraped data by default, though we can do so for clients on a case-by-case basis.
Are all the datasets on your platform only relevant to the U.S.?
We make data available from a large number of both U.S. and international sources, and while the U.S. represents the country with the largest number of datasets on the platform, we actively work with our partners to onboard new data sources that they request.
Can you add datasets from non-English sources?
We work actively with our clients to onboard the data that they want to see on the platform. Because many of the automated tools we use to organize and categorize data rely on natural language processing, we request client assistance to help translate any key areas of a non-English dataset intoEnglish in order for it to be made available on the OneData platform.
Can you acquire new datasets at a client’s request?
Yes, we actively work with clients during the implementation period of our proof-of-concepts to find, clean, and onboard the datasets that they would like to see.
Are all your public datasets available for use by everyone on the platform?
All public data is available to everyone on the Apteo platform for viewing, however only premium and enterprise clients have the ability to download this data on the platform or through an API.
How do you tag and categorize millions of datasets?
We leverage natural language processing to parse and understand datasets based on their names, descriptions, and keywords. From there, we can categorize each datased based on its content. We then run millions of correlation and feature selection jobs to understand which stocks and metrics each dataset is associated with. Finally, we have a team of people who double-check and refine these categorizations.
How do you handle messy or incomplete data?
We’ve developed several tools to help us automatically clean datasets. These tools include outlier detection and removal or replacement, date format standardization, text-to-number conversion, data aggregation techniques, data binning and encoding techniques, data imputation, and dataset categorization. We also actively work with our clients to automate data transformations that they wish to see on the platform.
Can you customize the charts and graphs on the OneData platform?
Yes, we work with our enterprise clients to enable customization of these graphs.
How many tickers are on your platform?
We currently support most of the stocks in the Wilshire 2000, and we work actively with clients to integrate additional stocks on demand.
Can you add private companies onto your platform?
Yes, we actively work with our clients to onboard private companies onto our platform on demand.
How long does it take to add new stocks or tickers onto the platform?
Depending on the location and amount of requested information that’s available, we can add stocks to the platform in as little as a few days.
What types of data can be correlated on the platform?
By default, the platform allows users to correlate any company financial metric with any dataset available on the platform. We work with enterprises to customize this functionality on an as-needed basis.
What does the A.I. forecaster do?
The forecaster is a no-code, point-and-click interface that allows domain experts to use machine learning to create forecasted financial metrics for the future. Domain experts can easily set up a forecast by telling the system which stocks and metrics they want to make a forecast for, along with any relevant data that they want the system to use and can then receive a forecasted value of the relevant metric.
How do these models learn from the data? How can they learn about the future if all they have is historical data?
These models learn patterns by repeatedly separating data into two sliding windows: an initial set of historical data and a later set of historical data. The machine learning models repeatedly attempt to find patterns within the initial set of historical data, and then check how accurate they were on the later set of historical data, which they did not see. By measuring their accuracy on the later set of data, we can identify if they have learned enough to be accurate.
Each machine learning model has an internal mathematical representation of how it thinks the patterns in data can be used to make forecasts. The process of learning involves many repeated guesses and checks, where the models use what they have learned at any given point about the initial set of data to make forecasts for the later set of historical data. The models then measure how accurate or inaccurate they were and tweak their internal mathematical functions to improve their forecasts on the next round of learning. At some point they’re not able to make any notable improvements, at which point the learning process stops.
Can Apteo produce confidence intervals for its point estimates?
We can in certain cases. For example, when a linear regression is the best model to use, we can provide confidence intervals quite easily. In other cases, we can provide confidence intervals after adjusting the optimization function for our models, which is a feature we can enable for our enterprise clients.
What learning algorithms do you use?
We incorporate standard machine learning metrics, including linear regression models, support vector machines, gradient boosting machines, random forests, and neural networks.
Are there custom or advanced models for enterprise customers vs. professionals?
The free version of our product has a limited number of models, whereas professional subscribers and enterprises have all models available to them. We can also implement custom models and additional hyper-parameter optimization for enterprise clients.
What kind of correlations can users run?
The OneData platform automatically handles datasets that are updated on different frequencies and offers many different strategies for running correlations using specified timeframes, leading or coinciding correlations, and using different sliding windows for leading indicators. The system’s advanced settings also allow users to modify the system’s default behavior when handling datasets that are updated on different frequencies, including allowing a user to specify whether they want data to be forward-filled or imputed.
How does the forecaster work?
The forecaster uses machine learning to create predicted future values for key metrics. Users select the stock and metric that they would like to predict, along with any relevant datasets they think the system should take into account. The system then uses those suggested datasets, along with historical information about the metric, to learn any useful patterns for predicting future values of that dataset, by using that data to train various machine learning models. Once these models have been trained, the one with the best accuracy is selected and used to make a future forecast.
Can users see in-sample vs. out-of-sample metrics for the forecaster? What about other evaluation metrics?
By default, these metrics are not shown. However, for enterprise implementations, we can make available a large amount of evaluation metrics for each model that is trained. We support several standard metrics for both in-sample and out-of-sample data, including root mean squared error, mean absolute error, mean squared error, and we also work with our clients to implement any other metrics they wish to see.
Given the relatively low number of data points available for training, how accurate are the models you build?
We use several different types of models during our training process, everything from simple linear models to larger and more complicated deep networks. Each model is then evaluated on its accuracy using time series cross-validation.
From there, we can evaluate the relative accuracy of each model against every other model, as well as the accuracy of each model against objective metrics. If none of the models have a sufficient level of accuracy, we can surface a warning to the end user that the output of the model may not be reliable. However, if a particular model was able to learn sufficiently, we can showcase its results as expected.
Additionally, we are able to implement synthetic data generation to increase the number of data points available for training, which can result in improved learning capabilities in certain scenarios. This feature is available for our enterprise clients.
Can clients incorporate their own proprietary learning models?
Depending on the way enterprise client models are trained and stored, we can work with a client teams to incorporate the models they have trained and created.
How can clients integrate their data onto the platform?
We have several different data connectors that allow us to plug in to the most common data sources, allowing us to integrate data into our platform wherever it’s stored. We make this feature available to enterprise clients, and we work with clients to implement any data connectors that we do not currently have and they require.
What data sources do you support?
We support many of the common data sources available today, including, but not limited to SQL and NoSQL databases, flat files, S3 buckets, APIs, data lakes and data warehouses. Additionally, we work with clients to support other data sources we don’t currently support.
Can clients use Apteo’s UI to integrate datasets from other vendors that are not on the platform?
Yes! This is one of our core use cases and we work with clients to integrate any of their vendors onto OneData.
Does client data need to be structured in a specific format to be onboarded onto the OneData platform?
We support many of the most common data formats (XML, JSON, CSV, TSV, plain-text, Excel), and will work with enterprise clients to support any other formats they require.
Does Apteo perform data scraping for clients?
Once a client’s compliance team signs off, we will work with enterprise clients to incorporate any data they would like to see on the platform.
Working With Data Science Teams
How does Apteo work with internal data science teams?
We find that in most enterprises, data scientists are spending a large majority of their time on low value tasks, like data cleaning and wrangling. Our platform automates many of these tasks out of the box, and we work with data science teams to implement data transformations and aggregations that they spend much of their time on. This allows them to iterate quickly on higher value tasks, like running simulations, conducting experiments, analyzing data, and feature engineering.
Can clients add their own code or algorithms into the platform?
We’re happy to learn more about the use case and work with client teams to incorporate their models or code into the platform where possible. We would typically spend time interviewing a client’s data scientists and managers to understand their primary needs, and scope out the time and effort required to complete a successful integration.
Can Apteo provide data via an API?
Yes, we have an API available that client teams can use to request individual datasets or collections of datasets based on asset or topic/theme.
What kind of model files can you easily support?
After working with a client’s team and understanding the use case, we seamlessly integrate with pickle, TensorFlow, and PyTorch files.
How does Apteo work with enterprise to deploy the platform securely and in a compliance-friendly way?
We work very closely with our clients to provide ourOneData platform in a secure and isolated tenant on the cloud. We replicate our entire infrastructure within siloed infrastructure, either on Apteo’s cloud or on the client’s cloud. We then work closely with clients to plug in to any of their existing systems that they would like integrated into the product, whether those systems are user directories, which can be useful for single sign-on, or enterprise data stores, which can be used as the source for data that can be discoverable on our platform.
We understand the importance of data security and have architected our platform such that client data is always securely and logically isolated, and we’re happy to chat with clients about any additional safeguards. Finally, we work with clients to implement data connections, transformations, dashboards, and other features that they’d like to see in the platform.
How does Apteo handle updates and patches if the platform is hosted in the client’s cloud?
Apteo can either assume a limited access management role in the client’s cloud to manage our infrastructure, or we can work with the client to implement a remote patching and upgrading mechanism upon enterprise sign up for along-term subscription.
Can clients host data on their cloud and host everything else on Apteo’s cloud?
We will do our best to be extremely flexible to satisfy a client’s security requirements. Today, our platform is architected in a way that makes it easy to plug and play different data sources, and depending on the complexity of a client’s setup, there may be an option to deploy with a multi-hosting model.
How do you ensure client data is secure, compliant, and protected?
We implement the enterprise version of our product in an isolated tenant environment, whereby the storage and compute resources dedicated to client implementation are always logically separated, and physically separated where possible. We use VPCs and white-labeled ports with end-to-end encryption to ensure client data is securely transmitted, and we use storage encryption to ensure data is encrypted at rest. Finally, we have a permissioning model built into our product that allows data owners to control who has access to datasets via the platform. We also work with enterprise compliance teams to ensure all data on the platform is acceptable. Our library of public data comes from fully public data sources that make their data available via an API. Any data sources that clients ask us to onboard will also receive sign off from the client’s compliance teams when necessary.
Can you encrypt S3 files for secure sending if a client were to host data on their own cloud?
Yes, we can work with enterprise IT teams to implement encrypted data transmission.
What support do you provide?
During the proof-of-concept phase, clients will have direct access to Apteo’s management, and can email or call us at any point. After the proof-of-concept stage and once the client purchases a long-term subscription, we issue an SLA that outlines the levels of support that we provide.
How quickly does your team move?
We value speed and move very quickly to implement new feature requests, onboard new data, and provide a high level of service.
Does Apteo customize its product based on client requests?
Yes! During the proof-of-concept implementation phase, we work closely with the clients to enhance the product as needed. Our primary goal is to make sure we deliver a product that users will love.