Current Issues in Data Replication

Everything you need to know to efficiently select and implement
the optimal replication solution for your company.

Historical Perspective

When distributed client/server systems began to proliferate in the mid 1980's, very few RDBMS experts immediately saw the need for data replication technology. These were primarily experienced IT managers working with very large systems who had already encountered the multitude of problems that can arise when shared data is distributed across multiple locations.

These managers realized that users could only access current data if that data resided on a local server. Users had to wait until nightly batch processing updated their local server with information from remote locations before they could access all of the company's current data. Using data replication technology, however, users could access current information at all times.

The companies which valued this benefit most, and therefore invested in replication technology first, were primarily large financial industry firms. They had the money to invest in leading-edge technologies including the experimental development of home grown replication software. And they understood that their investment in replication technology would directly return a significant increase in profits.

Discovering the Benefits of Replication Technology

Initially, many financial firms used replication technology to provide their offices with up to the moment data about ongoing trading activities at other offices in the different financial markets where the company was trading. For example, a firm's London, Tokyo, and Singapore offices could relate information about their trading activities and positions to their New York office throughout the day, so that analysts and traders in New York could view the firm's portfolio as a whole while making trading decisions. Managers could better coordinate the company's trading activities across global markets, and maximize profits on the firm's entire portfolio.

After working with distributed database systems for only a short time, financial industry companies also realized that they needed a better way to ensure system fault tolerance and high availability of data. The inability to execute even one trade could cost the company millions of dollars. Again, they discovered that data replication, when properly implemented as a software solution, could provide system fault tolerance and high availability of all important data to all system users.


Major Benefits of Replication Technology
Data Sharing Fault Tolerance Gradual System Migration
High Availability Hot-Site Backup More Economical than Hardware Solutions

Replication Catches On

By the late 1980's, many large companies in different industries also had migrated to distributed systems. They, too, soon understood that replication technology could offer benefits in any situation where users needed access to all of their company's locally and globally distributed data at the same time.

As these companies also began to implement home grown data replication solutions, they discovered that data replication software could provide even more benefits, including gradual migration of systems from the mainframe to a client/server environment. Replication could also enable a remote site to serve as a backup "Hot-Site".

Home Grown Replication Solutions

By the very early 1990's many large companies had begun to develop some kind of home-grown data replication solution. However, most quickly discovered that designing replication software that would meet all of their requirements and actually work well in their environment was not as easy as it first sounded.

Many IT managers started their replication development projects with the misconception that replication simply involved "copying data". This, combined with the general lack of knowledge about replication solutions, caused cost overruns on the majority of projects. Many projects failed, and many others developed only limited solutions.

Proliferation of Replication Products

At this time, product vendors took the opportunity to enter the growing replication market, and began to develop off-the-shelf data replication products.

In 1990, AFIC Technologies, Inc., a small international software development company headquartered in New York City, released a software product providing data replication, fault tolerance, and high availability. This product was sold to multinational clients worldwide, including Moody's Investors Service, Merrill Lynch, Chemical Bank, Phibro Energy Division of Salomon Brothers, and others.

By 1993, The Ask Group (acquired by CA-Ingres) and Sybase, Inc. had also introduced products for data replication. Their first customers consisted primarily of large, financial industry companies who had not yet successfully developed their own replication solutions. By 1995, most major vendors of RDBMS had introduced a replication product of their own to the growing replication solution market. These included Oracle, Informix, Computer Associates (Ask/Ingres), and Microsoft.

What We Learned

I worked as the chief developer of the AFIC product, and my current business partner managed the company's operations. We developed the product for Sybase on Sun, and we later ported it to various RDBMS and platforms.

When we were designing a replication product that could be sold "out of a box", we were really given the task of creating a generic solution that could work in virtually any environment. I have always suggested, and continue to suggest, that this is not feasible. First, distributed database systems tend to be complex and heterogeneous, coming in too many shapes and sizes. Moreover, a proper replication solution must be fitted to too many variables.

Unfortunately, I was right in regard to the AFIC product. In the end, every customer needed something different. The product out of the box was either too big, or too small, or did not fit exactly into the customer's complex heterogeneous environment. So we spent a lot of our time (and the customer's money) making the product fit each company's requirements.

Since we left AFIC in 1994, we've worked with the virtually all of the replication products developed by other major vendors (Sybase, Oracle, CA-Ingres, etc.) as they've become available. We've also developed completely customized solutions for a variety of needs and environments.

Unfortunately, we've discovered that I was also right in regard to all of the other commercially available replication products that we've seen. Even the products offered by Sybase, Oracle, and other major vendors often require a large amount of customization to fit into each company's environment. They also require a significant amount of work to configure and maintain. Customization, configuration, and maintenance costs often far exceed the cost of the product itself. In many cases, they simply cannot be made to meet the company's specific requirements.

Due to the complexities of selling a standardized replication product, and stiff competition from the "big boys", AFIC stopped selling its replication product in 1995. While Sybase is the current market leader in off-the-shelf replication products, the issues outlined above remain current and relate to it's replication product, and all of the other products that we've seen being sold off-the-shelf today.

Shopping for a Replication Solution

When we started selling replication software, we had to begin by telling customers what replication was and what it did. The market has become much more sophisticated since then.

Most technology professionals working with distributed systems now recognize the need for data replication technology. However, they have listened to a lot of hype from a lot of marketing gurus at a lot of major database vendors - and they may not have a real idea of what to look for in a data replication solution that will meet their needs and work in their environment.

The best data replication solution for you is heavily dependent on your goals and environment. Issues you'll want to think about before selecting a solution include:

  1. What are your goals? (for example, data sharing, high availability, fault tolerance, hot-site backup, or some combination of these?)
  2. What are your absolute minimum requirements and budget? Do you have a realistic budget for the functionality you require?
  3. What granularity of replication (database, table, row, or column level ) do you need?
  4. Can you partition your data, or must you allow multiple ownerships? In case of multiple ownerships, can you define a clear set of conflict resolution rules?
  5. How tightly synchronized must your data be? Do you really need up to the moment data, or can you replicate periodically (for example, every few hours or once per day)?
  6. Do you require simultaneous updates at remote sites?
  7. What kind of communication lines can you allocate?
  8. What bandwidth can you allocate?
  9. How many licenses will you need to buy? What will this cost?
  10. Will you need to expand your system in the future, for example to other remote locations? How many more licenses will you need to buy? What will this cost?
  11. Do your people have the experience to design, implement, and maintain a comprehensive solution on the first try?
  12. What will be the total cost of the complete solution including research, license and user fees, installation, configuration, customization, training, testing and system downtime during this phase, plus internal and external ongoing support ?
Development of the Optimal Replication Solution
1. ANALYZE - how replication can be used optimally in your system. 2. DEFINE - your requirements and all available resources. 3. COMPARE - alternative theoretical approaches to replication implementation.
4. EVALUATE - commercially available replication products. 5. CONSIDER - alternatives including in-house or customized development. 6. SELECT - the most cost-effective solution that will exactly match your requirements.

Comparing Replication Products

A side by side comparison of available replication products is a time consuming and detailed process, involving extensive research. In the end, you will probably find that each vendor's product has its advantages and disadvantages.

Your best approach is to clearly define your needs by answering the questions above. Then, shop for the solution which most closely matches your requirements.

Although this may seem obvious, many people begin their shopping in reverse. They first gather information about all of the available replication products, and then simply choose the one that offers the most features. In doing this, they may fall to see that many of the features they end up paying for are just bells and whistles in relation to their actual requirements.

Others simply select the product offered by their database vendor, for a variety of reasons. The database vendor may seem to offer the only solution that will work well with their system. Or, the purchaser may believe that "The replication product is already paid for under the company's bulk purchase agreement". These people often fail to account for the indirect costs they incur when they overlook alternative solutions and choose a product that does not exactly match their requirements.

Considering Custom Developed Replication Solutions

Once you have carefully defined your requirements, and you understand which replication product comes closest to meeting them, you should also consider whether a custom developed solution may be a better option for you.

A custom developed replication solution actually can be more cost effective than an off-the-shelf product. First, it will be designed specifically to match all of your requirements, and only your requirements. You will not have to pay for features you really don't need. Also, you will pay a fixed price for this solution, which can offer substantial savings over products that are priced according the number of servers and users involved. You will not have to pay additional license fees if you later decide to add more servers and/or users.

Your "build vs. buy" analysis should consider that much knowledge about the development of customized replication solutions has been gathered in the last five years. You can take advantage of the experiences of others who have already worked extensively with replication technology during this time.

For example, you can avoid many pitfalls by learning from the experiences of developers who created in-house replication solutions even before any off-the-shelf replication product was available. You can avoid other mistakes by learning from the development experiences of replication product vendors while developing, releasing and upgrading their products. And, you can learn a lot from the developers who have implemented the extensive customization that has been done to off-the-shelf solutions to make them work for each customer!

Conclusion

While the number of standardized data replication products in the market has grown over the past five years, there is still no 'one size fits all' solution available. In fact, a proper 'one size fits all' data replication product probably will never exist.

Gain a thorough understanding of the issues to consider when selecting the best data replication solution for your needs and environment before beginning to shop for a solution. Do not expect to find a standardized solution that fits your environment exactly, and be prepared to spend resources on installation, configuration, and customization. Then, you may be happy with any one of the replication products currently available today.

Finally, consider your alternatives seriously: (1) a home-grown solution developed with or without the assistance of experienced replication experts; or (2) a completely customized solution created for you by experienced replication solution developers.

by Sergey Fradkov
Sr. Technical Specialist, UNIF/X Inc.

UNIF/X helps companies select or develop replication solutions to exactly match their needs and environment. You may contact Sergey at UNIF/X Inc., 67 Wall Street, Suite 2411, New York, NY 10005. Email sergey@unifx.com . Telephone 212-406-1400.

Author Biography
Sergey Fradkov

Mr. Fradkov serves as a Senior Technical Specialist and manager of UNIF/X development teams in the US. He has been working with relational databases and replication technology for over seven years.

After receiving his Master's Degree in Computer Science, Mr. Fradkov led the design, development, and implementation of various leading edge technologies internationally. He also served as the Head of Product Design and Development for AFIC Technologies, Inc. - leading development teams in the US and Israel to produce one of the very first generally available database replication software products to be sold to major clients worldwide.

Currently, Mr. Fradkov provides high level consulting and project management services, drawing on his expertise with distributed database systems (including data replication, database connectivity, and distributed application development), as well as world wide web business systems development.


   HOME   COMPANY INFO    SERVICES FIND US!

PARTNERS    |    UNIF/X IN THE PRESS    |    WEB/DB LINKS