6/27/2014

Setting Up Data And Log Files For SQL Server

When a database is created in SQL Server it creates two files by default: one is the data file with an extension of .mdf that stores the actual data, and the other is the log file with an extension of .ldf that stores the transaction log information used to recover data in disaster recovery scenarios. The mdf is the primary data file; in case of very large databases it might be useful to have secondary data files with an extension of .ndf in which case  the data will be stored in both mdf as well as ndf files.
The placement of mdf, ndf and ldf files on a particular drive is a matter of decision based on the size requirements of each file, and more importantly on the impact it will have on the overall performance of the database. As the log file is write-intensive it should be placed on a separate drive and the data files also  placed on separate drives of their own. Even secondary data files should be placed on a separate drives to optimize performance. Why? This is because, it increases parallelism while accessing the data and log files for read/write operations.
The log file is write-intensive, and the data file (depending on your application) is read/write intensive. Suppose you have multiple users simultaneously updating and requesting data. For data retrieval, the log file is not needed, and so the requests for data can be satisfied from the data file placed on its own drive. This would involve the movement of only the spindle of that drive on which the data file is kept. For data updates, the log file is heavily used, and so any write operations would involve the movement of the spindle of the drive on which the log file is placed. Now, if both log and data files are placed on the same drive, the amount of spindle movement would be heavy so as to satisfy both read and write operations. But, if you place the data and log files on separate disk drives, then the spindle movement for write operations won’t interfere with that of the read operations, since each drive’s spindle would be moving independently to satisfy read as well as write requests, thus improving the performance of data retrieval and data updates.
Ideally, the separation of data and log files on drives should be done while creating the database, but this can be done later also after the database is created and is in use. SQL Server is very flexible in that. The way to go about doing this is to use the two statements that SQL Server provides: CREATE DATABASE   and   ALTER DATABASE. Let’s look at both examples where database_name is the name of the database being created and/or altered, and os_path is the operating system file path.
While creating a database, you use the Create Database statement in the following manner:
CREATE DATABASE database_name ON
(NAME=Database_name_data,
 FILENAME = ‘os_pathdatabase_name_data.mdf’),
(NAME=Database_name_log,
 FILENAME = ‘os_pathdatabase_name_data.ldf’)
If you want to add secondary data files, the statement would be:
CREATE DATABASE database_name ON
(NAME=Database_name_data,
 FILENAME = ‘os_pathdatabase_name_data.mdf’),
(NAME=Database_name_data,
 FILENAME = ‘os_pathdatabase_name_data.ndf’),
 (NAME=Database_name_log,
 FILENAME = ‘os_pathdatabase_name_data.ldf’)
If you already have a database and want to change the drive location of either the data or log file, then use the Alter Database statement in the following manner:
First, bring the database to an offline status by giving the following command:
ALTER DATABASE database_name SET OFFLINE
Next, change the location of either the data file or the log file with the following command for each file:
ALTER DATABASE database_name MODIFY FILE ( NAME = database_name_data, FILENAME = 'new_pathdatabase_name_data.mdf' )
ALTER DATABASE database_name MODIFY FILE ( NAME = database_name_data, FILENAME = 'new_pathdatabase_name_data.ndf' )
ALTER DATABASE database_name MODIFY FILE ( NAME = database_name_data, FILENAME = 'new_pathdatabase_name_data.ldf' )
Finally,  bring the database back online by issuing the following command(but before giving this command make sure you copy the data and log files to their new locations):
ALTER DATABASE database_name SET ONLINE
After this you should verify the changed locations of the files by querying the sys.master_files catalog view:
SELECT name, physical_name AS CurrentLocation, state_desc
FROM sys.master_files
WHERE database_id = DB_ID(N'database_name');
The above procedure is valid for both SQL Server 2008 and 2005.

5/28/2014

Introducing Windows Azure Web Jobs

Ref: http://www.hanselman.com/blog/IntroducingWindowsAzureWebJobs.aspx

'm currently running 16 web sites on Windows Azure. I have a few Virtual Machines, but I prefer to run things using "platform as a service" where I don't have to sweat the underlying Virtual Machine. That means, while I know I can run a Virtual Machine and put "cron" jobs on it, I'm less likely to because I don't want to mess with VMs or Worker Roles.
There are a few ways to run stuff on Azure, first, there's IAAS (Infrastructure as a Service) which is VMs. Then there's Cloud Applications (Cloud Services) where you can run anything in an Azure-managed VM. It's still a VM, but you have a lot of choice and can run Worker Roles and background stuff. However, there's a lot of ceremony if you just want to run your small "job" either on a regular basis or via a trigger.
Azure Explained in one Image
Looking at this differently, platform as a service is like having your hotel room fixed up daily, while VMs is more like managing a house yourself.
Azure Explained in one Image

As someone who likes to torch a hotel room as much as the next person, this is why I like Azure Web Sites (PAAS). You just deploy, and it's done. The VM is invisible and the site is always up.
However, there's not yet been a good solution under web sites for doing regular jobs and batch work in the background. Now Azure Web Sites support a thing  called "Azure WebJobs" to solve this problem simply.

SCALING A COMMAND LINE APPLICATION WITH AZURE WEBJOBS

When I want to do something simple - like resize some images - I'll either write a script or a small .NET application. Things do get complex though when you want to take something simple and do it times. Scaling a command line app to the cloud often involves a lot of yak shaving.
Let's say I want to take this function that works fine at the command line and run it in the cloud at scale.
public static void SquishNewlyUploadedPNGs(Stream input, Stream output)
{
    var quantizer = new WuQuantizer();
    using (var bitmap = new Bitmap(input))
    {
        using (var quantized = quantizer.QuantizeImage(bitmap))
        {
            quantized.Save(output, ImageFormat.Png);
        }
    }
}
WebJobs aims to make developing, running, and scaling this easier. They are built into Azure Websites and run in the same VM as your Web Sites.
Here's some typical scenarios that would be great for the Windows Azure WebJobs SDK:
  • Image processing or other CPU-intensive work.
  • Queue processing.
  • RSS aggregation.
  • File maintenance, such as aggregating or cleaning up log files. 
  • Other long-running tasks that you want to run in a background thread, such as sending emails.
WebJobs are invoked in two different ways, either they are triggered or they are continuously running. Triggered jobs happen on a schedule or when some event happens and Continuous jobs basically run a while loop.
WebJobs are deployed by copying them to the right place in the file-system (or using a designated API which will do the same). The following file types are accepted as runnable scripts that can be used as a job:
  • .exe - .NET assemblies compiled with the WebJobs SDK
  • .cmd, .bat, .exe (using windows cmd)
  • .sh (using bash)
  • .php (using php)
  • .py (using python)
  • .js (using node)
After you deploy your WebJobs from the portal, you can start and stop jobs, delete them, upload jobs as ZIP files, etc. You've got full control.
A good thing to point out, though, is that Azure WebJobs are more than just scheduled scripts, you can alsocreate WebJobs as .NET projects written in C# or whatever.

MAKING A WEBJOB OUT OF A COMMAND LINE APP WITH THE WINDOWS AZURE WEBJOBS SDK

WebJobs can effectively take some command line C# application with a function and turn it into a scalable WebJob. I spoke about this over the last few years in presentations when it was codenamed "SimpleBatch." This lets you write a simple console app to, say, resize an image, then move it up to the cloud and resize millions. Jobs can be triggered by the appearance of new items on an Azure Queue, or by new binary Blobs showing up in Azure Storage.
NOTE: You don't have to use the WebJobs SDK with the WebJobs feature of Windows Azure Web Sites. As noted earlier, the WebJobs feature enables you to upload and run any executable or script, whether or not it uses the WebJobs SDK framework.
I wanted to make a Web Job that would losslessly squish PNGs as I upload them to Azure storage. When new PNGs show up, the job should automatically run on these new PNGs. This is easy as a Command Line app using the nQuant open source library as in the code above.
Now I'll add the WebJobs SDK NuGet package (it's prerelease) and Microsoft.WindowsAzure.Jobs namespace, then add [BlobInput] and [BlobOutput] attributes, then start the JobHost() from Main. That's it.
using Microsoft.WindowsAzure.Jobs;
using nQuant;
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;
 
namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            JobHost host = new JobHost();
            host.RunAndBlock();
        }
 
        public static void SquishNewlyUploadedPNGs(
            [BlobInput("input/{name}")] Stream input,
            [BlobOutput("output/{name}")] Stream output)
        {
            var quantizer = new WuQuantizer();
            using (var bitmap = new Bitmap(input))
            {
                using (var quantized = quantizer.QuantizeImage(bitmap))
                {
                    quantized.Save(output, ImageFormat.Png);
                }
            }
 
        }
    }
}
CONTEXT: Let's just step back and process this for a second. All I had to do was spin up the JobHost and set up a few attributes. Minimal ceremony for maximum results. My console app is now processing information from Azure blob storage without ever referencing the Azure Blob Storage API!
The function is automatically called when a new blob (in my case, a new PNG) shows up in the input container in storage and the Stream parameters are automatically
"bound" (like Model Binding) for me by the WebJobs SDK.
To deploy, I zip up my app and upload it from the WebJobs section of my existing Azure Website in the Portal.
image
Here it is in the Portal.
image
I'm setting mine to continuous, but it can also run on a detailed schedule:
12schdmonthsonpartweekdaysoccurences
I need my WebJob to be told about what Azure Storage account it's going to use, so from my Azure Web Site under the Connection Strings section I set up two strings, one for the AzureJobsRuntime (for logging) and one for AzureJobsData (what I'm accessing). 
image
For what I'm doing they are the same. The connection strings look like this:
DefaultEndpointsProtocol=https;AccountName=hanselstorage;AccountKey=3exLzmagickey
The key here came from Manage Access Keys in my storage account, here:
image
In my "Hanselstorage" Storage Container I made two areas, input and output. You can name yours whatever. You can also process from Queues, etc.
image
Now, going back to the code, look at the parameters to the Attributes I'm using:
public static void SquishNewlyUploadedPNGs(          
   [BlobInput("input/{name}")] Stream input,           
   [BlobOutput("output/{name}")] Stream output)
There's the strings "input" and "output" pointing to specific containers in my Storage account. Again, the actual storage account (Hanselstorage) is part of the connection string. That lets you reuse WebJobs in multiple sites, just by changing the connection strings.
There is a link to get to the Azure Web Jobs Dashboard to the right of your job, but the format for the URL to access is this: https://YOURSITE.scm.azurewebsites.net/azurejobs. You'll need to enter your same credentials you've used for Azure deployment.
Once you've uploaded your job, you'll see the registered function(s) here:
image
I've installed the Azure SDK and can access my storage live within Visual Studio. You can also try 3rd party apps like Cloudberry Explorer. Here I've uploaded a file called scottha.png into the input container.
image
After a few minutes the SDK will process the new blob (Queues are faster, but blobs are checked every 10 minutes), the job will run and either succeed or fail. If your app throws an exception you'll actually see it in the Invocation Details part of the site.
image
Here's a successful one. You can see it worked (it squished) because of the number of input bytes and the number of output bytes.
image
You can see the full output of what happens in a WebJob within this Dashboard, or check the log files directly via FTP. For me, I can explore my output container in Azure Storage and download or use the now-squished images. Again, this can be used for any large job whether it be processing images, OCR, log file analysis, SQL server cleanup, whatever you can think of.
Azure WebJobs is in preview, so there will be bugs, changing documentation and updates to the SDK but the general idea is there and it's solid. I think you'll dig it.