Advanced Analytic and Predictive Model Creation
There are two different ways to create Predictive models natively within the Spotfire platform–within the Spotfire UI, and with TERR (using the R language).
Creating Advanced Analytics in Spotfire
Spotfire is a powerful Visualization and Data Discovery tool, in which predictive models can be created, and visualizations can be enhanced with advanced analytics (using Tools in the UI, using the Spotfire Expression Language, and/or using R snippets via the TERR engine). Note, there is not a separate manual for Spotfire–all the capabilities are described in the HTML help files (link here: Spotfire Help, providing full details. Links to specific help files below).
- Predictive Analytics in Spotfire White Paper: for an overview of our approach
- Spotfire Help: HTML Help files for Spotfire,
- Advanced Analytic Tools in Spotfire. All of these tools require no R scripting to use:
- Data Relationships
- K means Clustering
- Line Similarity
- Hierarchical Clustering
- Predictive Modeling (Linear and Logistic Regression, Classification and Regression Trees)
- This tutorial shows how to create, evaluate and predict with predictive models in Spotfire without coding in R, using these tools.
- Forecast Tool enables users to add a simple forecast to a Time Series plot with a single click
- For aggregations and descriptive statistics, the Spotfire Expression Languageprovides numerous methods to enable users to enhance their data with simple functions.
- For adding quick R-based calculations using TERR, TERR can be called directly from the Spotfire Expression language. For an example and full documentation on this, see this Tutorial.
Creating Advanced Analytics in TERR using the R language
Advanced analytics and predictive models can be created and evaluated in TERR using the R language, similar to the process in open source R. TERR provides native implementations of most of the core models in R, and the ability to load and run many (1800+ currently) additional packages from CRAN. For a full list of functions in TERR, see the TERR Language Reference. This means many more models are available natively in TERR, beyond the list of Spotfire tools above.
To make this easier for the end user, TERR is compatible with the RStudio IDE (so that RStudio users can simply point their RStudio IDE to an installed TERR engine, and use it to provide a more productive user interface).
The advantages for the Data Scientist in using TERR to create models is that TERR is faster and more memory efficient, allowing larger models to be created faster with TERR. For example, here are some performance results for fitting and scoring a Generalized Linear Model (GLM).
In addition, a number of customers are creating models on larger data sets using TERR with Hadoop, using KNIME for model creation, and collaborating with Lavastorm Analytics (who are an OEM for TERR). We also support running TERR on large grids, through TIBCO GridServer.
Analytic Applications for Analysts and Other Business Users
Once these advanced analytics have been created, they can be integrated into Spotfire applications, to allow a wider community of users within an organization to leverage advanced analytics for better decision-making.
Within Spotfire, the TERR engine (or other statistical engines) are used to help business users make better decisions through the Data Functions framework in Spotfire. This provides a way of building very flexible and powerful applications.
- To build more advanced analytic applications, where the TERR engine (or other statistical engines) are used to help business users make better decisions, the Data Functions framework in Spotfire provides a way of building very flexible and powerful applications.
- What are Data Functions? provides a general overview of Data Functions, and How to Use Data Functions provides an overview of how to create and use data functions within Spotfire.
- This tutorial provides a walk-through of the capabilities. These same method is used whether call TERR/R Scripts, SAS code, MATLAB code, etc.
- These applications can be quite sophisticated, enabling a business to build and evaluate their own predictive models for a given business problem, and perform sophisticated What If analysis. This video shows an application for Retail Optimization, and this example shows a Supply Chain Optimization application which also leverages GeoAnalytics capabilities. These applications are all built using native Spotfire.
- In addition, however, there is even more predictive capability through Spotfire’s ability to call SAS, MATLAB and other analytic engines through the same unified Data Function interface.
- There are customers in the Finance and Oil & Gas industries using MATLAB from Spotfire, and customers in Pharma and Finance calling SAS from Spotfire.
- There are also out-of-the-box templates with Spotfire for calling both SAS and MATLAB.
Deploying Advanced Analytics into real time applications
In addition to integration into Spotfire, models developed in TERR can be deployed into real time applications. The TERR engine has integration with TIBCO CEP offerings, including TIBCO Streambase and TIBCO Business Events. This allows analytics to be developed in R, then deployed into environments for real-time model scoring and decision-making. Examples of the applications we are working with customers include:
- Logistics Optimization
- Port Congestion Detection
- Maritime Abnormality Detection
- Predictive Maintenance for Oil & Gas
- Severe Weather Alerts Tracking for Facilities
- Customer Loyalty Analytics: Deliver real-time predictions on whether to extend an offer to a given customer
Enterprise-grade R engine
To support the above use cases, it is critical to have an R engine that is fast and robust enough to work in production, so that R code can be developed and deployed without having to be rewritten using specialized libraries or in an alternative environment. That is the core purpose of the TERR engine. To explain further about these capabilities:
- TERR is faster than OS R:
- For Small to moderate size data sets, on many common operations TERR is 2-10x as fast as OS R
- For Larger data sets, for Common operations (e.g., model scoring) or complex, real-world scripts, TERR is 10-100x as fast as OS R
- TERR is more robust than OS R, with more efficient memory management. It was architected from its initial design for 64-bit architectures (unlike OS R). This means much larger data sets can be handled in memory, and performance generally stays linear with data size (unlike OS R, where memory leaks can accumulate and eventually crash the program)
- TERR is fully supported by TIBCO
- Since TERR is not GPL, it can be licensed by TIBCO to vendors who want to implement a tight, efficient integration with R analytics. (GPL often forces vendors to implement loose connections to avoid threats to their IP)
In addition, since TERR is part of the Spotfire platform, it can leverage other capabilities there, including direct connections to a large variety of Big Data source (including Hadoop). TERR can reuse Spotfire Data Connections directly, and so has connectivity to all these data sources.