Tuesday, July 9, 2013

SQL Linear Regression

This post is  part of the Regression and Model Fitting Tutorial

Linear regression is one of the all time most used model fitting techniques used in the world today. As part of a recent project I had a use case where I was generating  a lot of reports with a lot of charts that used linear regression. Having someone run these by hand would have taken months where a small amount of development effort reduced that effort so that a number of charts could be generated with the most current data on a daily basis.

Before we get too far into the details of the query, we should review a little bit of the math behind linear regression. The basic idea of linear regression is that we want to fit a line (usually in the form of y = a + bx ) against a set of points where we minimize the total "overall" error between the line and the data points (in the figure below the yellow line is the regression line and the blue points are the data points):



Mathematically, this is a problem that is well understood and breaks down to simultaneously solving a set of linear equations using the method of least squares. The equations take the following form, solving for a and b in a way that minimizes the overall error of the model:



Although you can algebraically solve it in this form, it is extremely tedious and quickly gets out of hand when you have more than a few data points. The best way that I've learned to solve these equations goes back to linear algebra and matrices. We take our observations and create the following three matrices (x1, x2, x3, etc are observations of the X variable and y1, y2, y3, etc are observations of the Y variable):



and we solve the following equation to get a and b left in X.



In the end, we end up with this (this can be simplified a step further to have average notation.:



Anyway, it is relatively easy to build a query that returns these parameters. Below is the SQL query that I use for linear regression. This query will work in MySql and Microsoft SQL Server. Depending on whether other platforms have the functions required (ex. Postgres, Oracle, IBM DB2), it may work there as well. Beware of how your DB platform handles summations as you may run into numerical issues on some data sets:


-- Developed by Mike Burr
-- This query calculates a linear regression
-- in the form y = a + bx and calculates
-- the correlation coefficient for the source data


select a as 'a',
       b as 'b',
       -- Correlation coefficient
       (ss_xy * ss_xy)/ (ss_xx * ss_yy) as 'r_r'
from (
   -- In this inner query we calculate the parameters
   -- and the correlation coefficient for the linear model
   -- that we calculated

   select
      ((avg_yi * sum_xi_xi) - (avg_xi * sum_xi_yi )) /
      (sum_xi_xi-(n* avg_xi * avg_xi))
      as 'a',

      (sum_xi_yi - (n * avg_xi * avg_yi)) /
      (sum_xi_xi - (n * avg_xi * avg_xi))
      as 'b',
  
      sum_xi_xi - (n * avg_xi * avg_xi )
      as 'ss_xx',

      sum_yi_yi - (n * avg_yi * avg_yi )
      as 'ss_yy',

      sum_xi_yi - (n * avg_xi * avg_yi )
      as 'ss_xy'

   from (
      -- In this inner query, we build the
      -- variables used in the linear regression
      -- calculation
     
      select avg(y) as 'avg_yi',
             avg(x) as 'avg_xi',
             count(x) as 'n',
             sum(x*x) as 'sum_xi_xi',
             sum(y*y) as 'sum_yi_yi',
             sum(x*y) as 'sum_xi_yi',
             sum(x) as 'sum_xi'       
      from (
     
         -- Insert source data query here
         -- Alias the x-variable column as 'x'
         -- Alias the y-variable column as 'y'

      ) as source_data
   ) as regression
) as final_parameters



Back to Mike's Big Data, Data Mining, and Analytics Tutorial






Monday, June 17, 2013

How to Prepare for A Wildfire Related Evacuation

I had an experience on Tuesday that I hope not too many people have. I ended up getting evacuated from the Black Forest wildfire near Colorado Springs while I was visiting my parents who live on Herring Road (near a couple of the most severe burn areas so far). I was able to keep a relatively clear head and was able to help my parents evacuate, themselves, their grand kids, and all of their pets besides a horse and a cat (who are now confirmed to be safe).  I also helped with a couple of basics with getting their home ready for a forest fire to occur and getting some of their irreplaceable possessions packed and out of harm's way.

Even though this post is outside of my normal area, I want to reflect on some basics for home preparation and how I approached my evacuation experience to help my parents. This post will break down three areas and identify: what are the highest priorities for evacuation? How do you prepare your home and other structures before you evacuate?



What are the typical stages of an evacuation?
  • Pre-evacuation/voluntary evacuation: Danger is headed your way, but you may have some time to get things ready to go. You can probably get more than just people/pets out and
  • Mandatory evacuation: You've either received a reverse-911 call or you have a first responder knocking on your door. There is a clear and present danger and you need to leave immediately so that you aren't killed in the wildfire. You should get people (and pets if there is time our) and go immediately. There is no time to pack possessions and you'll need to hope for the best at this point.
  • You should start packing and preparing as soon as you are aware of a possible threat to your property. By the time evacuation becomes mandatory, it is already too late to pack/prepare and any delay could mean the difference between life and death. You might even consider packing if there is an elevated fire danger in your area, but no fire yet...
So... what are the priorities with wildfire evacuation?

  1. Get all of the people out! This is the primary concern that you and the first responders share. Wildfires can be extraordinarily hot and there is likely no safe place to "wait it out." If you get an mandatory evacuation notice or there is a clear and present danger, don't wait... get out... Go away from the fire and smoke if possible, and only travel through areas with flame and smoke if there are no reasonable alternatives (ex. driving through open fields, across terrain, etc). There's a good chance that if the fire is moving fast enough and is hot enough, there is no possible search and rescue option that first responders can exercise to get you out.
  2. Pets and livestock. This follows the "Life is irreplaceable" ideology above. When given a choice of something to lose, most people would rather lose a material possession than a pet. If you have enough time, get your pets out. If you can't transport certain pets (horses, cattle, sheep, pigs, etc), attaching halters/harnesses and opening the gates may be your only option. If possible, ensure that all gates to the nearest road or open space are open to provide an escape route. Consider cutting fences and other obstructions to the nearest road if the fire is moving in the direction that will cut out possible escape routes through open gates.  Volunteer organizations and first responders mobilize to move "found" pets to the edges of the wildfire areas when possible so that they can safely be taken to boarding areas that are set up for disaster relief. Additionally, first responders may work to take down fences in areas where there are a lot of livestock properties to allow livestock to escape to fire boundaries.
  3. Irreplaceable and difficult-to-replace material possessions. If you have material possessions that cannot be replaced, you may try to pack as many as you can before you need to leave. From highest to lowest, here's what I think someone should take:
    • Required Medications and a first aid kit.
    • Cell phone/charger
    • Tools to help clear a defensible space around yourself in the event of a breakdown. This includes shovel, rake, hatchet/chainsaw, fire extinguisher, water, particulate masks or wet cloth (to assist with breathing in the presence of heavy smoke).
    • Photo albums and photo storage devices.
    • Family heirlooms in the form of books, letters, etc.
    • Family Photo/Video Blu-Ray, DVD, CD-ROM, VHS, Beta
    • Laptops, tablets, MP3 players (IPod, Zune, etc), and other mobile devices.
    • Computers (you really only need the "tower" and any external storage [hard drives, tape drives, and media]). Also remember password lists and any USB drives or smart cards that "unlock" the encryption on your computer.
    • Uncashed checks, cash, and stock/bearer bond certificates. If you are insured and these are declared items, then it might only be an inconvenience to have to replace these items.
    • Records for un-invoiced work if you have a home-based business or store store records in the property that is threatened.
    • Accounting records for a home based business.
    • Signed contracts that are not filed with a county clerk or secretary of state, business incorporation documents, LLC formation, partnership agreements, real estate title records, motor vehicle titles, birth certificates, passports, tax records, and other important legal documents. These records may be retrieved from a secretary of state or county clerk and recorder if they were required to be filed.
    • Firearms and ammunition that are not stored in fireproof safes. Discharging firearms/ammunition present a danger to first responders working to fight the wildfire and save structures. If you can't take these with you, then put them in a location that will minimize their impact, such as a filled bathtub, bucket of water, metal box, stove, freezer, or refrigerator. If the structure burns down, you don't want first responders to be hit with discharging ammunition.
    • Other irreplaceable items, such as family heirlooms and antiques.
    • Finally, at the very bottom of the list, pack easily replaceable items as if you are packing for a trip. This includes clothes, food, water, toiletries, etc.
  4. Make any possible preparations for the structures that will burn. If you have time and you have the ability, you might take the following precautions to help firefighters protect your structures. I list these from the easiest to hardest.
  • If there is time, shut off the power to your structures. This will help change the fire from a Type 3 (electrical fire) to Type 1 (regular combustible organic solid) and allow the firefighters to use water directly without risk of electrocution and without aggravating the fire due to electricity.
  • Create a defensible boundary around your structures. Firefighters typically try to create a fire-line that is 1.5x the height of the tallest nearby object that will burn.
    • Water the top of your structure and surrounding areas. We started a sprinkler on the roof shortly after recognizing that the approaching fire presented a credible threat. If you have buckets/trash cans that can be filled with water, these can be placed in strategic areas to help cool down areas where the fire is likely to approach from.
    • Remove any flammables/accelerants from your structures. This includes gasoline cans, paints, mineral spirits, varnishes, barbecues, firewood, and propane/kerosene tanks. Place these items in a clear area away from the structures as these items are likely to explode and the heat should be directed away from your structures..
    • Clear combustible material (pine cones/needles/leaves/plants) away from the structures.
    • Fill buckets and other containers with water and place them in strategic areas around structures that may be burned.
    • If there is time and you have the ability/equipment, cut nearby trees and drag them away from the structure. The main risks from burning trees involve a burning tree falling onto a structure and radiant heat. The heat of the fire may still cause spontaneous combustion of the structures, but you can prevent a falling burning tree from landing on the structure and causing it to burn from direct flame contact. To cut a tree, first cut a notch in the direction of the fall and then cut a line through the tree to a point slightly above the notch from the side away from the fall. Do not fall trees if you lack the proper equipment/ability or the fall threatens power lines or other structures!

Monday, May 6, 2013

Deleting Entries in Non-Microsoft LDAP Directories in .Net and PowerShell

In this series on working with non-Microsoft directories in PowerShell and .Net

 A previous post of mine that covered LDAP searches in non-Microsoft Active Directory LDAP directories (ex. OpenLDAP, Oracle Internet Directory, Novell eDirectory, etc.) gained a fair amount of popularity and I received requests to follow up on it to provide instructions on performing other LDAP functions using the .Net Framework and PowerShell.

Continuing from my last post, I went ahead and set up an OpenLDAP server for demonstration purposes utilizing the core, cosine, and inetorgperson schemas. I have prepopulated it with the following objects:

dn: dc=mikesblog,dc=lan
objectclass: dcObject
objectclass: organization
o: Mikes Technology Blog
dc: mikesblog

dn: cn=Manager,dc=mikesblog,dc=lan
objectclass: organizationalRole
cn: Manager

dn: ou=testou1,dc=mikesblog,dc=lan
objectclass: organizationalUnit
objectclass: top
ou: testou1
description: My first test OU

dn: cn=mike,ou=testou1,dc=mikesblog,dc=lan
objectClass: person
objectclass: inetorgperson
givenName: Mike
sn: Burr
uid: mike

dn: ou=testou2,dc=mikesblog,dc=lan
objectclass: organizationalUnit
objectclass: top
ou: testou2
description: My second test OU

In this example, we will delete the testuser user in testou2.

To accomplish this, we will use the .Net framework classes in the System.DirectoryServices.Protocols namespace.  The LdapConnection class gives our application connectivity to the LDAP server where we can provide an DeleteRequest object and receive an DeleteReply. In the DeleteRequest, we specify via the Modifications property what attribute additions, modifications, and deletions need to occur.

The sample code below is designed to demonstrate how to modify attributes for an LDAP entry, in practice this might be designed to use the command pattern instead of the hardcoded transaction script below.

The first code example shows some discussion of the delete subtree server control. This is typically used to delete an object and all objects under it in a directory tree. It is not supported on all platforms. Microsoft and IBM publicize the ability to use this server control with their LDAP implementations, but others, such as OpenLDAP (as of 2.4) do not support it.

Let's get to it... The sample C#.Net code:



// Connects to myopenldap.mikesblog.lan
// on the standard port
LdapConnection c = new LdapConnection("myopenldap.mikesblog.lan:389");

//Set session options
c.SessionOptions.SecureSocketLayer = false;
c.SessionOptions.ProtocolVersion = 3;

// Pick Authentication type:
// Anonymous, Basic, Digest, DPA (Distributed Password Authentication),
// External, Kerberos, Msn, Negotiate, Ntlm, Sicily
c.AuthType = AuthType.Basic;


// Gets username and password. There are better
// ways to do this more securely...
// but that's not the topic of this post.
Console.Write("Enter Username: ");
string username = Console.ReadLine();

Console.WriteLine();

Console.Write("Enter Password: ");
string password = Console.ReadLine();

// Bind with the network credentials. Depending on the type of server,
// the username will take different forms. Authentication type is controlled
// above with the AuthType
c.Bind(new System.Net.NetworkCredential(username, password));

// We are going to delete the user below (LDIF Below)
/*

dn: cn=testuser,ou=testou2,dc=mikesblog,dc=lan
objectclass: person
objectclass: inetorgperson
uid: testuser
givenName: Test
sn: User
 
*/

DeleteRequest r = new DeleteRequest();

//What are we deleting?
r.DistinguishedName = "uid=testuser,ou=testou2,dc=mikesblog,dc=lan";

// Use the delete subtree server control to
// remove any objects under the targeted object
// Not all LDAP implementations support this, so this may
// throw an exception:
/*
System.DirectoryServices.Protocols.DirectoryOperationException: A protocol error occurred.
at System.DirectoryServices.Protocols.LdapConnection.ConstructResponse(Int32 messageId, LdapOperation operation, ResultAll resultType, TimeSpan requestTimeOut, Boolean exceptionOnTimeOut)
at System.DirectoryServices.Protocols.LdapConnection.SendRequest(DirectoryRequest request, TimeSpan requestTimeout)
at System.DirectoryServices.Protocols.LdapConnection.SendRequest(DirectoryRequest request)

 * To figure out what hapened, catch the DirectoryOperationException
 * and look at Response.ErrorMessage:
try {
...
}
catch (DirectoryOperationException e)
{
    Console.WriteLine(e.Response.ErrorMessage);
}

 * Since OpenLDAP doesn't support this control, it returns:
 * treeDelete control value not absent
 *
 * Depending ont the server's configuration, it'll also show up in the OpenLDAP logs

May  6 21:46:06 myopenldap slapd[61167]: conn=1032 op=1 RESULT tag=107 err=2 text=treeDelete control value not absent
May  6 21:46:06 myopenldap slapd[61167]: conn=1032 op=1 do_delete: get_ctrls failed
May  6 21:47:21 myopenldap slapd[61167]: conn=1032 fd=18 closed (connection lost)


 */

//Uncomment the line below for subtree delete (if your server supports it)

//r.Controls.Add(new DirectoryControl("1.2.840.113556.1.4.805", null, true, true));


//Actually process the request through the server
DeleteResponse re = (DeleteResponse)c.SendRequest(r);


if (re.ResultCode != ResultCode.Success)
{
    Console.WriteLine("Failed!");
    Console.WriteLine("ResultCode: {0}", re.ResultCode);
    Console.WriteLine("Message: {0}", re.ErrorMessage);

}


Console.Read();


Not to disappoint, below is the PowerShell port of the above code.


#Mike Burr
#Script Connects to and Deletes OpenLDAP directory entry

#Load the assemblies
[System.Reflection.Assembly]::LoadWithPartialName("System.DirectoryServices.Protocols")
[System.Reflection.Assembly]::LoadWithPartialName("System.Net")


#Connects to myopenldap.mikesblog.lan using SSL on a non-standard port
$c = New-Object System.DirectoryServices.Protocols.LdapConnection "myopenldap.mikesblog.lan:389"
          
#Set session options
$c.SessionOptions.SecureSocketLayer = $false
$c.SessionOptions.ProtocolVersion = 3

# Pick Authentication type:
# Anonymous, Basic, Digest, DPA (Distributed Password Authentication),
# External, Kerberos, Msn, Negotiate, Ntlm, Sicily
$c.AuthType = [System.DirectoryServices.Protocols.AuthType]::Basic

# Gets username and password.
$user = Read-Host -Prompt "Username"
$pass = Read-Host -AsSecureString "Password"

$credentials = new-object "System.Net.NetworkCredential" -ArgumentList $user,$pass

# Bind with the network credentials. Depending on the type of server,
# the username will take different forms. Authentication type is controlled
# above with the AuthType
$c.Bind($credentials)

# We are going to Delete this object (LDIF Below)
#

# dn: cn=testuser,ou=testou2,dc=mikesblog,dc=lan
# objectclass: person
# objectclass: inetorgperson
# givenName: Test
# sn: User
# uid: testuser

$r = (new-object "System.DirectoryServices.Protocols.DeleteRequest")
$r.DistinguishedName = "cn=testuser,ou=testou2,dc=mikesblog,dc=lan"

#See the code example above for discussion on the delete subtree control
# $r.Controls.Add((New-Object "System.DirectoryServices.Protocols.DirectoryControl" -ArgumentList "1.2.840.113556.1.4.805",$null,$true,$true ))

#Actually process the request through the server
$re = $c.SendRequest($r)

if ($re.ResultCode -ne [System.directoryServices.Protocols.ResultCode]::Success)
{
    write-host "Failed!"
    write-host ("ResultCode: " + $re.ResultCode)
    write-host ("Message: " + $re.ErrorMessage)
}