3/01/2012

The Road to Professional Database Development: Set-Based Thinking

Under the pseudonym of 'SwePeso', Peter Larsson is famous on SQL forums for the amazing performance he can get from SQL. How does he do it? In the first of a series of articles, Peter explains his secrets.
The single most common barrier to writing efficient database code, for a developer versed in a typical client language (such as C#), is an inability to leave behind the habit of working on one row at a time, in favor of working with a set (a collection of rows).
What is the difference? Let's examine a very simple example of summing up 5 rows in a table. The following code creates a table variable (@Sample) and populates it with the rows of data.
DECLARE @Sample TABLE ( Data INT )
INSERT @Sample( Data )VALUES ( 1 ),
(
2 ),
( -
4 ),
(
5 ),
(
8 )
SELECT DataFROM @Sample
For a .NET developer this pseudo-code would be the way to perform this simple sum:
Dim intS AS Integer = 0, 'Initialise the sum to zero
objR AS Row

For Each objR In coll.Rows
intS += objR.Data
Next

Debug.Print intS
This is the standard approach for a procedural language; we not only tell the compiler what we want (the sum of all objects) but also how we want to approach the objects when calculating the sum. In this example, the coll variable holds a collection with objects of type Row. The For-Next loop iterates through all the objects and, for each object, fetches the current value from the Data property and adds the value to the intS variable. In other words the code operates on one row at a time.
T-SQL is a declarative language which means that the programmer is supposed to tell the database server what is required but let SQL Server decide the best way to finalize the task. However, many programmers force the procedural approach on SQL Server, with code such as the following:
DECLARE curSum CURSOR FOR
SELECT
DataFROM @Sample
DECLARE @s INT ,@Total INT = 0
OPEN curSum
FETCH NEXTFROM curSumINTO @s
WHILE @@FETCH_STATUS = 0 BEGIN
SET
@Total + = @sFETCH NEXTFROM curSumINTO @sEND

CLOSE
curSumDEALLOCATE curSum
SELECT @Total
Even though this code is written in T-SQL, it still operates on one object (row) at any given time. This approach is commonly referred to as Row-by-Agonizing Row (RBAR). If we increase the number of rows tenfold, the time to complete the task would increase by a factor of ten; for a very large table (100 million rows or more) this sum could easily take hours to complete.
So how do you do this in a set-based way? Luckily for us, T-SQL has an aggregate function available to do the job for us.
SELECT SUM ( Data )FROM @Sample
The following table compares the performance of the cursor method, versus the set-based method, for an increasing numbers of rows.
Rows CURSOR SET BASED Performance Factor

DURATION (mS) READS DURATION READS DURATION READS
1k 47 4009 0 7 953
10k 492 40004 3 42 164 953
100k 5189 400389 28 388 179 1039
1000k 51358 4003848 286 3847 180 1041
As you can see, both solutions scales linearly i.e. when we increase the number of rows by a factor of ten, we increase the query duration by roughly the same factor. However, there is one big difference, and that is that the set-based solution is about 180 times faster, and puts only about 1/1000th of the read pressure on the server!
So why is the performance difference so vast? We'll discuss this in full detail in a later article in this series, but in essence, the set-based approach allows SQL Server to return the all the required rows very efficiently, fully exploiting the underlying data storage mechanisms. The internal structure where the data is stored in SQL Server is called a Page and it defines the minimum amount of data read or written to the database, regardless of whether you want all rows in that Page or only one row. So, with a set-based approach, we only need to fetch a page once to get all x rows stored within, whereas with the RBAR approach, we must re-fetch the page x times. We'll also see in a later article how, with appropriate indexes in place, we can speed up this process still further.
This paradigm shift is all there is to it! SQL Server has some built-in aggregate functions to accomplish the task of summing a number of rows, and many others. The key difference is that you, in set-based thinking, instead of working with individual rows, we work with sets which can contain any number of rows.
To become a professional database developer you have to abandon the old habit of thinking what you want to do with each row, and start thinking instead about what you want to do with the set, by manipulating the columns. Think vertical instead of horizontal!
If we take this example a step further, considering the task of multiplying together all the rows, we will encounter the second biggest barrier to becoming a professional database developer, and that is math skills.
For a .NET developer, this pseudo code would be the way to go:
Dim intP AS Integer = 1, ' Initialise the product to 1
objR AS Row

For Each objR In coll.Rows
intP *= objR.Data
Next

Debug.Print intP
However, we just established that we no longer want to work with single rows, right? Unfortunately, however, while SQL Server provides a handy SUM aggregate function, it does not provide a MULTIPLY aggregate function. So, how do we perform this task in a set-based fashion? This is where your math skills become handy.
A * B * C equals e^(ln(A)+ln(B)+ln(C))
Having transformed the multiplication into a sum, it's easier to see that the final query should look something like this:
SELECT EXP ( SUM ( LOG ( Data )))FROM @Sample
But we still have an issue here; logarithmic operations are only allowed on positive values, and not including zero (logarithmic operations on zero and negative values are undefined in natural numbers and will give an Invalid Floating Point operation in T-SQL), and multiplication with zero, under any circumstances, is equal to zero. Also, remember that a multiplication with an odd number of negative numbers results in a negative product and multiplication with an even number of negative number results in a positive product. Here is one example of how to solve this problem:
IF EXISTS( SELECT * FROM @Sample WHERE Data = 0 )SELECT 0.0E -- If any value is zero, the total product is zero.ELSE
SELECT
CASE IsNegativeProductWHEN 1 THEN - EXP ( theSum )ELSE EXP ( theSum )END
FROM
(SELECT SUM ( LOG ( ABS ( Data ))), -- Sum all exponents
-- Keep track if product is negative or positive
SUM ( CASE WHEN Data < 0 THEN 1 ELSE 0 END ) % 2FROM @Sample
) AS d ( theSum , IsNegativeProduct )
The main reason for the checks in the above code is the limitations of the LOG function. So, can we complete the task in another fashion and still produce the correct result? Yes, we can:
DECLARE @p INT = 1
UPDATE @SampleSET @p * = Data
SELECT @p
This code, which looks very much like the C# pseudo code, produces the right answer, a value of -320, but how did it work? The difference is that we tell T-SQL to multiply the values but we do not force T-SQL to retrieve the rows in any order. Another reason we can use the code above is due to the nature of a product. If we want the product of A and B (a * b), the reverse is also valid (b * a); it doesn't matter which order we take the values in the multiplication, as long they are of same data type.
This leads us to the third 'light bulb' moment on the road to becoming a professional database developer: understanding that rows in a database have no order. This is very different to the way .NET developers think. A collection of objects is normally a double-linked list, where you can traverse forward and backward. Rows in a table can be logically ordered by assigning a sequence number or a sequence denoted by date and time, but you can never rely on the physical order of rows in a set.

Summary

  • Think in sets and not in single rows
  • Brush up your math skills as much as you possibly can
  • Rows have no determined physical order
Coming Next….
Efficient storage and retrieval: Normalization rules.

Scott Hanselman - Visual Studio 11 Beta in Context


Today Visual Studio 11 Beta is released and available for download. Don't want to read a big blog post? Phooey on you then! ;)
Made it this far? OK, cool. I wanted to do a post that would not only point you to a bunch of other resources, but more generally answer the obvious questions. The questions that I asked before I went to work for Microsoft four years ago. I always ask: What's changed, and why should I care?

"One ASP.NET"

One of the things that the fellows and I are working on that will be more obvious  after the beta and even a little after the final release is this idea of One ASP.NET. We're sweeping through the whole framework, samples, templates and NuGet to make sure things work together cohesively. You'll hear more about this as we firm it up.
Some guiding principles for ASP.NET are these:
  • Every reasonable combination of subsystems just works
  • The model is easily extended by community projects (not just us!)
  • Every subsystem or application type includes a *.Sample that works together with others
Here's the boxes diagram I've been playing with.

These principles are some of the things that drove (and continue to drive) ASP.NET this development cycle. We are trying to give you great mobile options, great HTML5, CSS3 and JavaScript support and also including open source libraries like Modernizr, jQuery, jQuery UI and Knockout.
We are working towards a more pluggable, more friendly but still powerful ASP.NET. Again, more on this soon and some surprises. We'll see more interesting uses of NuGet, more plug-ablity, more examples, and more systems working together.
You want to mix and match a ASP.NET Web API, serialization with JSON.NET, use MongoDB, run Web Pages and Web Forms side by side, add some some SignalR and a few WCF enterprise Web Services? Add in ELMAH, Glimpse, Image Resizer and your favorite NuGet packages? Totally. It's Encouraged. It's all One ASP.NET.

.NET Framework 4.5 Beta

For the most part in my experience, .NET 4.5 is a very compatible release. .NET 4.5 upgrades .NET 4 as .NET 3.5 upgraded .NET 3 (and 2, although we're trying to play by the versioning rules now, thank goodness.) The vast majority of .NET 4 apps should work fine on .NET 4.5 unless you are doing something exotic. I haven't had any problems myself, but I've heard of some unusual edge cases with folks doing ILMerge and a few other things.
There's a number of new improvements. Some of my personal favorites and features I'm most interested in are (these are somewhat obscure, but nice fixes, IMHO):
  • Ability to limit how long the regular expression engine will attempt to resolve a regular expression before it times out.
  • Zip compression improvements to reduce the size of a compressed file.
  • Better performance when retrieving resources.
  • Updates to MEF to better support generics.
  • new Asynchronous methods in I/O classes for Asynchronous File Operations
  • Support for Internationalized Domain Name parsing
  • WPF Ribbon Control
  • WCF HTTPS protocol mapping
  • WCF Asynchronous streaming support
  • WCF Contract-first development  as well as ?singleWSDL for service URLs
Test your apps and PLEASE tell us if you have trouble. This is a beta and there is a still time to fix things.
Please don’t hesitate to post a comment on team blogs, or at one of the forums that are actively monitored: Connect (report bugs), UserVoice (request features) and MSDN Forums (ask for help). I know that folks have issues with Connect sometimes, but folks are actively monitoring all these places and are trying to get back to you with clear answers.

ASP.NET Core Framework

Here's a detailed release notes document about what's new in ASP.NET 4.5 and Visual Studio "Web Developer" 11 Beta. The core ASP.NET framework has a lot of new support around asynchrony. Asynchrony has been a theme throughout the whole Visual Studio 11 process and ASP.NET is full of improvements around this area.
There's support for the await keyword, and Task-based modules and handlers.
private async Task ScrapeHtmlPage(object caller, EventArgs e) 

{    

   WebClient wc = new WebClient();    

   var result = await wc.DownloadStringTaskAsync("http://www.microsoft.com");   

   // Do something with the result

}
Even IHttpAsyncHandler (a classic, and a difficult thing to get right) has a friend now:
public class MyAsyncHandler : HttpTaskAsyncHandler

{    

    // ...     

    // ASP.NET automatically takes care of integrating the Task based override    

    // with the ASP.NET pipeline.    

    public override async Task ProcessRequestAsync(HttpContext context)    

    {        

       WebClient wc = new WebClient();        

       var result = await             

       wc.DownloadStringTaskAsync("http://www.microsoft.com");        

       // Do something with the result    

    }

}
There's security improvements with the inclusion of core encoding routines from the popular AntiXssEncoder library, and you can plug in your own.
ASP.NET also has WebSockets support when running on Windows 8:
public async Task MyWebSocket(AspNetWebSocketContext context) 

{    

   WebSocket socket = context.WebSocket;    

   while (true)    

   {

   ...

   }

}
Bundling and Minification is built in and is also pluggable so you can swap out own techniques for your own, or your favorite open source alternative.
There's lots of performance improvements including features for dense workloads that can get up to a 35% reduction in startup time and memory footprint with .NET 4.5 and Windows 8.
ASP.NET 4.5 also supports multi-core JIT compilation for faster startup and more support for tuning the GC for your server's specific needs.

ASP.NET Web Forms

There's lots of refinements and improvements in Web Forms. Some favorites are strongly-typed data controls. I blogged about this before in my Elegant Web Forms post. There's two way data-binding in controls like the FormView now instead of using Eval() and you'll also get intellisense in things like Repeaters with strongly typed modelTypes.
Web Forms also gets Model Binding (all part of the One ASP.NET strategy) which is familiar to ASP.NET MVC users. Note the GetCategories call below that will bind to a View with IQueryable.
public partial class Categories : System.Web.UI.Page

{

    private readonly DemoWhateverDataContext _db = new DemoWhateverDataContext();

 

    public void Page_Load()

    {

        if (!IsPostBack)

        {

            // Set default sort expression

            categoriesGrid.Sort("Name", SortDirection.Ascending);

        }

    }

 

    public IQueryable<Category> GetCategories()

    {

        return _db.Categories;

    }

}
In this example, rather than digging around in the Request.QueryString, we get our keyword parameter this way:
public IQueryable<Product>GetProducts([QueryString]string keyword) 

{    

  IQueryable<Product> query = _db.Products;

  if (!String.IsNullOrWhiteSpace(keyword))    

  {        

    query = query.Where(p => p.ProductName.Contains(keyword));    

  }

  return query; 

}
Web Forms also get unobtrusive validation, HTML 5 updates and elements, and those of you who like jQuery but also like Web Forms Controls (as well as Ajax Control Toolkit fans) will be thrilled to check out the JuiceUI project. It's an open-source collection of ASP.NET Web Forms components that makes jQuery UI available in a familiar way for Web Forms people.

ASP.NET MVC and Web API

Last week I blogged about Making JSON Web APIs with ASP.NET MVC 4 Beta and ASP.NET Web API. ASP.NET MVC 4 includes these new features (and a few more) and is included in Visual Studio 11 Beta.
  • ASP.NET Web API
  • Refreshed and modernized default project templates
  • New mobile project template
  • Many new features to support mobile apps
  • Recipes to customize code generation
  • Enhanced support for asynchronous methods
  • Read the full feature list in the release notes
Matt Milner has a great post on where ASP.NET Web API and WCF proper meet and diverge, and why you'd use one over the other. I'll be doing a more detailed post on this also, but I like Matt's quote:
WCF remains the framework for building services where you care about transport flexibility. WebAPI is the framework for building services where you care about HTTP.

ASP.NET Web Pages 2

New features include the following:
  • New and updated site templates.
  • Adding server-side and client-side validation using the Validation helper.
  • The ability to register scripts using an assets manager.
  • Enabling logins from Facebook and other sites using OAuth and OpenID.
  • Adding maps using the Mapshelper.
  • Running Web Pages applications side-by-side.
  • Rendering pages for mobile devices.
There's lots  more to talk about in Razor and Web Pages 2 that I will talk about when Web Matrix 2 comes out.

Visual Studio - for Web Developers

Lot of new Web Features - Hello Opera users!There's an extensive list of features and fixes on the Web Developer Tools team Blog. Here are my favorites.
The HTML Editor is smart about HTML5 and you can develop smart HTML5 sites with any ASP.NET technique.
The CSS Editor has a new formatter, color picker, better indentation, smart snippets and vendor-specific IntelliSense. That's Webkit and Opera in the screenshot over there.
The Javascript Editor has IE10's Javascript engine and supports Javascript as a 1st class citizen with all the support you get in other languages like Go To Definition, brace matching, and more.
Page Inspector is all new and lets you to see what elements in the source files (including server-side code) have produced the HTML markup that is rendered to the browser. IIS Express is now the default web application host.

All the Links

General Info

Download

Secondary Downloads (for the IT folks)
Got Visual Studio issues? Complain (kindly) and vote up features and concerns at their UserVoice site.
Got ASP.NET issues? Complain to me (kindly) and vote up features and concerns at our UserVoice site or ask questions in the ASP.NET forums. There will also be new videos, tutorials and information at http://asp.net/vnext and we are continuing to update the site with fresh content.
Hope you enjoy the Beta. Please do take a moment to install it, try it out, and offer feedback. There is time for us to make changes, fixes and improvements but only if you give feedback.

Sponsor: My thanks to DevExpress for sponsoring this week's feed. There is no better time to discover DevExpress. Visual Studio 11 beta is here and DevExpress tools are ready! Experience next generation tools, today.


© 2012 Scott Hanselman. All rights reserved.

View article...

2/23/2012

Recovering a SQL Server Database from Suspect Mode

A couple of days back at I got a call from my support team informing me that one of our database located on the Production Server went into Suspect Mode. The version used was SQL Server 2005 Service Pack 3. Being a Production Database server, it was a Priority 1 incident and the expected time of resolution was 4 hours..
Solution:
The first step was to identify why this incident occured and after investigation it was found that it was due to the corruption of the transactional log file of the database.
I connected to SSMS using the sa login credentials and located the SUSPECT database:

I then reset the status of the SUSPECT Database by executing the below T-SQL query against the master database.
EXEC sp_resetstatus 'test_dr';
sp_resetstatus turns off the suspect flag on a database. This procedure updates the mode and status columns of the named database in sys.databases. Also note that only logins having sysadmin priveleges can perform this :

As you can see in the above screen capture, the T-SQL query gave the warning message upon execution:
You must recover this database prior to access
The next step was to set the SUSPECT database into an EMERGENCY mode. This was done by executing the below SQL query against the master database.
ALTER DATABASE test_dr SET EMERGENCY
Once the database is set to EMERGENCY mode it becomes a READ_ONLY copy and only members of sysadmin fixed server roles have privileges to access it. The basic purpose for this is to facilitate troubleshooting. I did not want other users updating the database while it was being worked on.

As you can see from the above screen capture, once the T-SQL query got executed successfully the state of the database changed from SUSPECT to EMERGENCY.
Once the database state was changed to EMERGENCY. I then performrf a consistency check by executing the below T-SQL query against the master database.
DBCC checkdb('test_dr')
Which resulted in the below output:

As seen from the above screen capture there is no issue with respect to consistency of the test_dr database. Also, this confirmed that the logical and physical integrity of the database was intact.
The next step was to set the database to SINGLE USER mode with ROLLBACK IMMEDIATE. To do this the below SQL query was executed against the master database.
ALTER DATABASE
test_dr SET SINGLE_USER
WITH ROLLBACK IMMEDIATE
The above query will rollback any transactions if any are present in the test_dr database and will bring the database named test_dr into Single User mode.
Please refer to the screen capture below:











The next step was to perform a DBCC Checkdb along with Repair with Data Loss by executing the below T-SQL query against the master database.
DBCC CheckDB ('test_dr', REPAIR_ALLOW_DATA_LOSS)
This query will attempt to repair all reported errors. These repairs can cause some data loss.
Once the DBCC CheckDB with the Repair with Data Loss option were executed, the Database went into Single User mode as shown below:

After performing the above step the database was brought ONLINE and Multiple Users access was enabled by executing the below T-SQL query against the master database.
ALTER DATABASE test_dr SET MULTI_USER
Please refer the screen capture below.

As you can see from the above screen capture the database named test_dr is back ONLINE. I am even able to view its objects as shown below:

As final step for safety, I again checked the consistency of the database which was just repaired and brought ONLINE (i.e. the test_dr database) by executing the below T-SQL query against the master database.
 DBCC CheckDB ('test_dr')

After performing the above steps I ensured that all the required logins had access to the database with proper privileges. The application started working fine and the business was back on track. It took just 38 minutes to bring the SUSPECT database back ONLINE.

Pareto Charts in SSRS

The purpose of a Pareto chart is to highlight the most important amongst a set of factors. For example, in quality control for a manufacturer, a Pareto chart can highlight the most common sources of defects and the highest occurring type of defect.
The Pareto principle is also known as 80-20 rule, so for quality control in a manufacturing environment, that 80% of defects may be expected to come from 20% of the manufacturing issues..

Let us say we need to display the below data in a Pareto chart.
\
After creating the SSRS report, drag and drop the chart and configure it as bar chart. Then drag and drop the Model Name to the x-axis and Sales Amount to the data region.
Select the graph area or chart series (note that you need to select the bars of the graph) and then press F4 (properties). In the custom attributes, select Pareto for ShowColumnAs as shown below:

You will then be able to generate your Pareto Chart.

I think graph the individual sales items are shown as the bars, and the line is the cumulative total which shows that 80% of the sales are generated from the five best selling models.

Knockout Session from South Florida Code Camp

by JohnPapa.net 

I had a great time at the South Florida Code Camp last weekend presenting a Whirlwind tour of Knockout and Javascript Patterns. The rooms were small and way overpacked, but I’ll take that as a sign that the topic is popular Smile
The Knockout session is a whirlwind tour of KnockoutJS ’s features. If you like it and want to see more in depth material on Knockout, you can check out my full  course at Pluralsight titled Building HTML5 and JavaScript Apps with MVVM and KnockoutJS.
image
Here are the slides and sample code from the presentation at code camp. Thanks for attending!