Raw Volume Creates Challenges
What does the “Big” in “Big Data” refer to? Most will answer: Volume. Modern living produces incredible amounts of data. Tasks that are simple at small scales – adding records, indexing, purging old data – become time-consuming and expensive as volume grows.
But quantity is only one dimension of volume. Other dimensions that require escalating resources as volume increases include:
- The volume of queries and lookups being performed against data
- Exports to outside systems
- Normalizing and de-normalizing transformations
- Data grooming tasks
The more important your data becomes, the more reliable access your users demand. Regular service outages become unacceptable. Even one-second delays explode into budget-busting project killers when those delays are multiplied over tens or hundreds of thousands of queries.
Are Aggregates the Answer?
Each time you aggregate data, you simplify the data itself — but increase the complexity of your system. You can reduce complexity of your users and manage volume by:
- Stripping off personally-identifiable details
- Creating demographic clusters from geotagged records
- Grouping customers by product
Don’t doubt it: your users will demand lots of aggregates. Every time you create one, you add both time and maintenance overhead to the entire system.
Escalating Demand Amplifies Performance Challenges
These colliding constraints put your system in a vice:
- the more data you have, the more time you need to manage it
- the more data you have, the more aggregates and summaries will be created
- the more data you have, the more valuable your system becomes to the users
- the more valuable your system is to the users, the less time it can be out of service
- the more valuable your system is to the users, the faster it must respond to their increasing demands
If this spiraling demand sounds familiar, you are confronting Extreme Performance.