Tuesday, January 5, 2010

The PowerPivot Experiment

Hello and Happy New Year 2010!

Wishing you all the best, a new year full of health, happiness, success and even more SQL Server knowledge!!! :)

As this is my first blog post in 2010, I thought that it should focus on a really hot topic! I hope you will enjoy it!

I called this post “The PowerPivot Experiment”. Here’s a little background information on how I decided writing this article.

When PowerPivot (formerly known as "Gemini") was released, I really wanted to test its real capabilities by trying to import a really large amount of information right into excel and perform aggregations. To this end, I designed a simple relational database with five tables, populated it with data and performed my experiment using PowerPivot. For the record, the total amount of data was more than 100 Million Rows.

In this post I will explain step-by-step the experiment, starting from the database design, then explaining the import process and finally, provide some examples on how I processed this huge amount of data.

Before describing the steps, I would like to list the several software technologies used, as well as the hardware and O/S configuration of my infrastructure.

Hardware –O/S Configuration:


  • OS Name: Microsoft Windows Server 2008 R2 Enterprise 
  • OS Version: 6.1.7600 Build 7600 
  • System Type: x64-based PC 
  • Processor: Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz, 2001 Mhz, 2 Core(s), 2 Logical Processor(s) 
  • Installed Physical Memory (RAM): 4.00 GB 
  • Total Physical Memory: 4.00 GB 
  • Available Physical Memory: 2.47 GB 
  • Total Virtual Memory: 8.00 GB 
  • Available Virtual Memory: 6.10 GB 
  • Page File Space: 4.00 GB 
  •  
    Software tools/technologies used:


    Database Design
    The following database diagram displays the tables participating in the DB Schema used in my experiment along with the table sizes in terms of total number of records:























    Database size and other statistics:










    Importing the Data Using PowerPivot

    Step 1: Starting up Excel 2010 and launching PowerPivot Window










    Step 2: Establishing a connection to a SQL Server Instance


























    Step 3: Selecting SQL Server Database Tables for importing it into Excel 2010




























    Step 4: Start Importing Data (Time: 15:46)




























    Let’s take a look on the status of the import process as it is occurring:





















    Step 5: Import completed! Total Time: 15:5415:46 = 8 minutes!!!

















    It only took 8 minutes for the entire import process to complete! I personally believe that this is awesome! :)
    Total number of records imported: 40,000 + 60,000 + 500,000 + 100,000,000 + 5,000 = 100,605,000

    Working with the pivot tables
    Right after importing the data, I proceeded with creating two pivot tables (by clicking on the “PivotTable” button in the PowerPivot window) for being able to easily perform aggregations:




























    The following two screenshots are examples of some basic processing I did against the data I imported into Excel using PowerPivot. Using the PivotChart tools, I visually built OLAP queries answering business questions like: (i) What is the total sales amount by product, and (ii) What is the total sales amount by customer:

















    Note that you can also modify the charts as you could also do in earlier versions of Excel. For example, in the below screenshot you can see that the chart “Total Sales by Customer ID” was modified in order to illustrate the data using a pie chart:













     

    Saving the workbook
    After I finished with aggregating the data using the PivotChart tools, I saved the workbook. Note that by saving the workbook, the data that were imported earlier into the workbook using PowerPivot, are stored IN the workbook and you can access it anywhere as they are contained in the workbook file as a binary object.

    In my example, the database size was around 2300 MB. If you take a look at the following screenshot which presents the properties of the saved workbook, you can see that the file size is 732 MB:






















    Remarks
    After performing the above practice, I was able to see in action and test some of the powerful capabilities of PowerPivot.

    The data compression, the speed of the import process, the ease of creating rich Pivot Tables and OLAP queries are some of the great features of PowerPivot that take Business Intelligence a step further!

    Well, that was a description of my “PowerPivot Experiment” :)

    I hope you found this post useful!

    There is more to come on PowerPivot so make sure you check back my blog soon!
    [Ads]
    Check out my latest eBooks on SQL Server:
    Administering SQL Server - Ebook Tuning SQL Server - Ebook

    Easily secure your SQL Server instances! Try out DBA Security Advisor!
    DBA Security Advisor for SQL Server by SQLArtBits
    DBA Security Advisor was developed by SQLArtBits. The tool was being carefully developed for an entire year and it contains comprehensive security checks and recommendations based on proven SQL Server security best practices. Feel free to download and use the Community Edition which is free and if you find it useful, you can consider upgrading to the Enterprise Edition!

    Artemakis Artemiou [MVP]

    Author & Editor

    Artemakis Artemiou is a Senior SQL Server Architect, Software Developer and Microsoft Data Platform MVP. He is also an author, regular blogger, president of Cyprus.NET User Group and CY Country leader of INETA-EU. He is also the creator of DBA Security Advisor and In-Memory OLTP Simulator. Artemakis is a frequent guest author of worldwide well-respected online journals where he writes articles focusing on many SQL Server topics.

    Reference: The SQL Server and .NET Hub (http://www.sqlnethub.com)

    5 comments:

    UkraineTrain said...

    PowerPivot looks like a great tool. I was thinking about writing Excel applications in VB.NET that can handle vast amounts of data. Is there a way to import tables programmatically into the PowerPivot and manipulate them there?

    Artemakis Artemiou [MVP] said...

    Yes, it is an extremely powerful tool for doing BI!

    For programmatically importing data into PowerPivot and manipulate it would require a developer API.

    To my knowledge, at the time being there is not such an API available but it will be in the future!

    You can check out the following link for more information on PowerPivot: http://powerpivotpro.com

    Rushabh said...

    Thanks for the post! Any chance you could post a backup of your sample database? Did you use the 32bit version of Office and PowerPivot?

    Artemakis Artemiou [MVP] said...

    Hi Rushabh,

    You are welcome!

    I used the 64-bit versions for everything!

    You can download the DDL for generating the database using the following link:

    http://public.blu.livefilestore.com/y1pbPjaS5lw9ZYe51Fv6ySkG_cgRiJ7PBJVEXQjRFXeydNDY1EiHQza-00EzgzXmnzu49aInbSYWJ58UvvX9MW2QA/PowerPivotExperiment_100mln_Schema.sql?download

    I cannot upload the entire database (along with the data) as it is quite large!

    Though, you can use Red Gate's SQL Data Generator for generating the sample data. You can find the tool on the following link:

    http://www.red-gate.com/products/SQL_Data_Generator/index.htm

    If you have any other questions please let me know!

    Cheers!

    Artemakis Artemiou [MVP] said...

    Updated download link for the DB Schema DDL: http://cid-f7a5cb9a0405dcb5.office.live.com/self.aspx/.Public/SQL%5E_DDL/PowerPivotExperiment%5E_100mln%5E_Schema.sql