Big data storage and analytics is not just about MapReduce and Hadoop. A number of specialised database products have grown up that look to solve the problems of data scale, while keeping a focus on costs, and overall complexity and manageability. One such database is RainStor, from the company of the same name. We spoke to RainStor's CEO John Bantleman about the database and how it's being deployed in the financial markets.
Q: First off, how did the company come to be, and what are the main business problems that RainStor is tackling?
A: RainStor was founded in 2004 with the purpose of designing a database from the ground-up to efficiently manage big data at the lowest cost with minimal complexity. RainStor's patented database is append-only and is therefore ideal for environments that have machine-generated data with volume and velocity that needs to be collected and stored for ongoing query and analysis.
Today's business is experiencing rapid data growth in addition to increases in sources of data which also leads to data with multiple structure. Depending on the industry sector, compliance regulations dictate that data must be kept for specific time-periods (usually years) and more importantly, the business users demand ongoing access to this data in order to follow trends and patterns, report and better predict future events. Even though hardware costs continue to drop, data growth rates outpace IT budgets and new data management and analytics approaches are now rapidly being adopted by large enterprises and specifically finance and banking.
Q: What does your offering consist of? What are the components? And what are the key features of it?
A: RainStor's big data database can be deployed as a primary or secondary database, depending on the specific use case and business requirement.
RainStor is a software only solution and can run both on-premise or in a private or public cloud. When deploying, you have the option to choose from a SAN, NAS, CAS or Hadoop Distributed File System (HDFS) platform. RainStor will happily run and scale on low cost commodity hardware but it is up to the individual client which storage and hardware configuration.
Key features include:
Ingest: Data is rapidly ingested in a number of different formats from sources including source databases, network logs or flat files (CSV, BCP format). RainStor easily scales to ingest extreme volumes up to 30,000 records per second per core.
Reduce: By only storing the unique field and pattern values contained within each imported record, RainStor de-duplicates the data, resulting in extreme compression - up to 40x compared to raw data.
Comply: RainStor’s auto-expiry of data from the repository is based on configurable retention policies that also support legal hold.
Analyse: The data in RainStor, while highly compressed, remains accessible using standard SQL, popular BI tools and MapReduce on Hadoop. The data requires no re-inflation to run the analysis and results are returned consistently at high performance.
Manage: RainStor does not require any specialist DBA skills to install and maintain and with no special tuning or indexes, requires low to zero maintenance over time.
Scale: As data volumes grow, the underlying hardware and storage platforms can be easily extended to meet additional demand.
Q: How does it fit in with Hadoop and MapReduce - and those companies that are already using them?
A: The RainStor database actually runs natively on Hadoop's Distributed File System (HDFS) and so all features and benefits that you benefit from running RainStor in a NAS environment are the same as when you run it on Hadoop. Your data footprint is reduced because of the granular level compression and therefore you require less nodes. Additionally, queries run 10-100x faster because of built-in filtering and the ability to dynamically eliminate files that do not contain the result-set. And with RainStor, you have the benefit of choosing SQL or MapReduce depending on the query and therefore productivity levels improve because SQL is a standard for all IT departments.
RainStor also provides built-in security and availability, which are expected requirements for large enterprises that deploy mission critical environments on Hadoop. RainStor can be deployed alongside the chosen Hadoop distribution (supporting all of them - Apache, Cloudera, Hortonworks, IBM and MapR) and in fact can be deployed after a customer deploys a Hadoop environment.
Q: What is the typical hardware and systems software platform for production implementations?
A: A typical server would comprise: RedHat Enterprise Linux, 12 core Xeon processors, 24-36 GB RAM. You scale accordingly depending on ingest, query and overall data size and growth.
Q: What are some financial markets applications where it has found success? How is its performance compared to traditional SQL and Hadoop data stores?
A: RainStor's database has been deployed at a number of financial services organisations and the exact architecture depends on the business use-case. Financial institutions are driven by compliance regulations where they must retain transaction and trade data for specific timeframes and when data growth rates exceed the capacity of the existing database or data warehouse environment, they need a plan to enable ongoing storage and access at lower TCO. RainStor augments the data warehouse where you offload older data to a RainStor instance, retain the data at much lower cost and at the same time auto-expire data as compliance rules dictate. Without RainStor, you may have to put the data to offline tape which is not queryable, very difficult to reinstate and certainly not easy to auto-expire which impacts compliance regulations.
A second use-case for financial institutions with RainStor is to land data from internal and external networks directly on a Hadoop based platform running RainStor's database. This could encompass customer behaviour activity with various online applications or trade and tick data that must be retained for specific time periods after the transactions take place. Because RainStor allows you to run queries via SQL or your favorite BI tool in addition to Hadoop MapReduce, your productivity levels improve in addition to overall costs being lowered. RainStor's query and analytics performance improves over standard Hadoop MapReduce because of the unique dynamic filtering capabilities giving you 10-100x faster response times.
Q: What's next for RainStor in terms of your business and technology directions?
A: RainStor continues to build out its business worldwide selling both directly and through its strategic partners helping large enterprises improve overall management and analytics of their critical data asset. RainStor is fast and easy to deploy and supports all enterprise standards that banks and financial services have come to expect which includes built-in security and availability as well as SQL and well-known BI tools. RainStor fundamentally believes that big data doesn't have to cost "Big Bucks" and when organisations need to reduce IT infrastructure costs, RainStor is a must-have. A key requirement for banking and financial institutions is greater control over the data being managed and RainStor's unique capability to auto-expire down to the record level makes it a unique fit for the industry.
RainStor continues to improve it's products focusing on it's unique combination of capabilities which include rapid data ingestion, compression, fast query and analysis, compliance, scale and overall ease-of-use providing a solution that has the lowest total cost.