Default Keys: Ctrl + Shift + Alt + ,
Frequently throughout my career in finance, I’ve been in situations where I needed to allocate data using Excel. As the somewhat disgusting saying goes, “There are many ways to skin a cat”, and this absolutely holds true of doing allocations in Excel. After skinning that allocation-cat so many different ways throughout the years, I’ve found that the best way is to use the SQL Left Join method. If you are an analyst that often needs to perform different allocations, or you are managing a team of analysts that performs allocations, this article might be a good resource for you.
Pros & Cons of the SQL Left Join
A quick discussion of the good and the bad of the SQL Left Join method to allocating. The reason I like this method the most is because it’s very scalable, extremely flexible, and easy to maintain.
The are a couple key things that make this method far more scalable than most other models I’ve seen. First, after the initial setup is complete, you can allocate 1 row, or 100,000 rows of data with no additional work. The only thing limiting you is the number of rows in Excel. The second thing that makes it so scalable is that it essentially uses a method to allocate that your IT department might naturally use. Because of this, when it comes time to hand it to IT to build-out using a proper database, they will quickly be able to understand what you did, and be able to port over your data.
Flexibility of an allocation model is another huge factor on whether the model survives past a single use, or blows up the instant a VP asks for a little more detail than what you have displayed; or implodes because while we initially wanted to allocate to only 5 segments, it turns out that 18 segments would be more useful; or… You get the idea. Because the SQL Left Join method doesn’t mind thousands of rows, we can perform our allocations on data at a more granular level of detail, and it will leave those details intact. Also, inherent to the way it works, it allows for changing how allocations are performed with very little work. Changing from a 5-way split to an 18-way split; no big deal. Deciding to allocate not just to 5 segments, but to have the rates be different for each quarter; not fun to maintain, but very doable. I will demonstrate these scenarios later throughout the article.
What makes this model so easy to maintain is that there are not a lot of complex formulas to manage or modify (because the SQL table does all the work) and the general data structure is the same regardless of allocation methodology — so if an analyst has seen the method used before in a different model, they will already have most of the knowledge necessary to maintain any new models.
The bad. Like everything else in life, the SQL Left Join method does have some drawbacks; it is not intuitive to setup, requires some understanding of SQL (Structured Query Language), and Excel has some not well-documented, finicky requirements in order to use it — which is where my Nutility comes in! I’ve developed the 2Table Left Join Nutility to mitigate these drawbacks by making the setup simple, and including documentation within it to point out most of the finicky gotcha items ahead of time.
This Nutility requires you to have the Nutilities Excel add-in installed in order to use it. If you don’t know what Nutilities is, or where to get it, please start at my web-page here: http://iwishexcel.com . The add-in is completely free to use. Please note that while this article discusses how to perform the SQL Left Join method using my 2Table Left Join Nutility, the method can also be setup manually by using the built-in MS Query feature in Excel (which is what my Nutility simplifies and automates for you).
SQL Left Join Basics:
Before I go into the step-by-step instructions on how to run 2Table Left Join, I think it is important that we first talk about allocations and how a SQL Left Join relates to them. Conceptually, when we describe allocating something, at its most basic level all we are doing is taking 1 thing, and splitting it up into multiple pieces. For example, if we are allocating quarters to months, we are splitting 1 quarter into 3 monthly pieces.
Let’s visually take a look at another allocation example. If our company has a miscellaneous bucket of expenses, and wants to assign a piece of those expenses to each of its 4 business unit’s GMs, we might setup a workbook with the following 3 sheets (pictured below). You can see that I have taken the single row of data on the first sheet, and then using the second sheet as my allocation guide, broken out that single row into 4 rows on the third sheet. Then I simply added a column to multiply, Total_to_Allocate times Allocation_Rate, to get my allocated amounts by GM.
Now this is all very straight forward and easy to setup for one row data, no SQL funny business necessary. I could just go ahead and do some copy-and-pasting, toss in some Excel formulas, and we’d be good-to-go. Even if you gave me 100 rows of data, I could probably manage it fairly easily as long as you had a static list of GMs. However, if you gave me 100,000 rows of data to allocate, of if you decided to change your mind about how many GMs to allocate to, or if you decided to change the allocation rate to be done by GM and by quarter, or God forbid, all of that! Updating my allocation model would certainly require lots of time, if not an entire redesign/rebuild.
This is where using a SQL Left Join steps into to save the day. What I did above, manually in Excel to make the third sheet, a SQL Left Join will do automatically. The table it produces will be refreshable, so that it can be updated as I add more data, or change my rate table. I will not go into how to write a SQL Left Join, as the 2Table Left Join Nutility does it for you, but it is necessary to understand the basics. In a Left Join, there is a Left Table, a Right Table, and the column on each you want to join on. The output from a Left Join will have every row from the Left Table. However, if it finds the value for that row in the comparison column on the Right Table, it will output all the rows it finds for it on the Right Table as well. So in the picture above, the first sheet is the Left Table, the second sheet is the Right Table, the Expense Category is the comparison column, and the third sheet is the output from the join. Because it found Miscellaneous 4 times on the right table, it brought back the row from the first sheet, that had Miscellaneous in it, 4 times.
This is clearer to see when you have another example. Below is what the output looks like when I have a added a couple more rows on the first sheet. You can see that the first 2 rows on the first sheet match 4 rows on the second sheet. So each of the rows is repeated 4 times in the output. You can also see that the last row on the first sheet did not match anything on the second sheet. So its value was output the one time with all the second sheet’s columns left blank, resulting in an allocation hole. We could easily plug this hole by adding as many “Other” rows to the second sheet as we needed.
Let’s say we wanted to allocate the Other Expense bucket to just 2 of the GMs. The setup and output would look like the picture below. All I did was add 2 lines the second sheet. It’s worth pointing out here that that is an example of how extremely flexible this setup is. I decided on the fly to allocate that bucket to 2 GMs. I could just as easily have decided to use 25 GMs, and even went ahead and changed my mind about the Miscellaneous bucket and split that into 10 GMs. For this allocation model change, we need only add/update some rows on sheet 2 and refresh. Other allocation models would fall apart with such decisions.
Using the 2Table Left Join Nutility is easy. Assuming you already have Nutilities installed, you need only have a workbook setup with at least 2 sheets. One sheet must have data on it (like first sheet above), and the other must have your allocation rates (like second sheet above). You must have a column called Nut_Join1 on your data sheet, and a column called Nut_Join2 on your rates sheet — these will be the comparison columns. With these in place, all you need to do is run the Nutility from that workbook (select a cell anywhere in the workbook and hit CTRL+Shift+Alt+, ).
You will be met with a popup giving you some details about how the Nutility works, followed by a second popup that allows you to decide if you want your comparison column to be case sensitive — while it’s slower to refresh, I prefer case insensitive because it’s less error prone. I prefer it to consider Miscellaneous and MISCELLANEOUS as the same thing. The third, and final popup asks you to confirm the table names for the Left and Right Tables. By default it will have already guessed those based on the sheets that it found Nut_Join1 and Nut_Join2 on. The confirmation is just in case you have those columns on multiple sheets for some reason. Once confirmed, you are done. A new sheet will be created in the workbook with your output data on it. You can right-click and refresh this table as necessary to capture changes made to either your data or rate table sheets.
Let’s take a look at another example, this one created using the 2Table Left Join Nutility. This time I will be breaking out regions to countries. The setup is nearly identical to what I’ve shown above — only now I have added the Nut_Join1 and Nut_Join2 columns. I have data on my first (Left Table) sheet, and a rate table on my second (Right Table) sheet. The third (Output Table) sheet is showing the table created by the Nutility. For the Nut_Join columns, I have just put a formula in each setting them equal to the Region columns. This may look redundant, keeping the original region columns and having the same value in the Nut_Join columns, but this is good practice for in case I change my mind about allocating only by region. I may later decide to have different rates for each region depending on the quarter (I will demo this next).
As you can see in the above picture, the output created by the 2Table Left Join Nutility is similar to the examples I showed earlier in this article. There are a couple differences worth noting. First, it does not have an Allocated_Amount column. You need to add the formula, Expense$ times Allocation%, yourself. The second difference is that there are a lot of columns. If you look close, there are in fact every column shown on both the first and second sheets — the Output Table will pickup any columns it finds on either of the 2 sheets, even new ones added after the Output Table was created! You are free to delete columns from the Output Table and they will not show again even after refreshing. The next picture shows the same example, the only difference being that I’ve cleaned up the Output Table’s columns, and added an Allocated_Amount column.
Now, let’s take a look at what we would do if we decided that we wanted to split the regions using different rates for each quarter — this is the type of thing that completely destroys lesser, inflexible allocation models! There are 3 main steps to make this happen: adding a quarter column to the rate table, adding all the new rates for each quarter to the rate table, and updating the Nut_Join columns on both tables to concatenate in the quarter columns. Once these 3 steps are complete, you can refresh the Output Table and it should automatically take into account your new setup. The below picture is what happened after I did the 3 steps and refreshed the Output Table. You can see that the Americas are now being allocated differently depending on the quarter.
Hopefully it’s clear at this point how great this allocation method can be. You can allocate as many rows as Excel can handle without adding any complexity to your setup. You have the flexibility to change your mind midway through about things. It’s easy to maintain because there are not a lot of complex, or layout dependent, formulas involved that need updating. And when the time comes to hand this model to IT to build-out, that conversation should go quick and smoothly because tabular data and SQL is naturally how they think.
Gotchas To Be Aware Of
As I mentioned earlier in this article, there are some drawbacks to the SQL Left Join method of allocating within Excel. If you are not aware of them, they make the method quite difficult to use, and they are not well-documented. However, once you know about them, they are really manageable and worth the hassle to get the benefits of the method. I will go over the main gotchas below, as well as any ways I know of mitigating them.
First, Excel is finicky about what you have on the sheets you use as your left and right tables — the sheets that have Nut_Join1 and Nut_Join2. You can do what you like on the Output sheet, but on those other 2 sheets, you cannot have anything in the cells above your headers, or to the left/right of your data — it will cause errors when creating/refreshing the Output Table. So in this picture, you cannot have anything in the red areas. Even if a cell looks blank in these areas, if you have ever typed anything into the cell and then deleted its contents, Excel may not believe it is blank. If you encounter an error refreshing or creating the Output Table, try highlighting the entire rows or columns in the red area, and use the Clear All button on the Excel Ribbon’s Home Tab. Do this on both the Left and Right Table sheets.
Second, when you refresh the Output Table, Excel knows how to find the data it needs based on a Connection String associated with the table. You can see the Output Table’s connection string by right-clicking the table and selecting Table —> External Data Properties —> Connection Properties button —> Definition tab. Then looking at the box labeled Connection String. You can also see its SQL by looking at the Command Text box here. Anyway, if you look closely at the connection string, you’ll see that it actually has a path to your file hard-coded into it 2 times. This path is how Excel knows where to look for your left and right tables to perform your SQL Left Join. Since it is hard-coded the string will not update automatically if you move the file, or change its name, or email it to someone else. In those cases, you would need to modify the string manually to point back to the file wherever it is newly located/named. To help with this, I have made it so that if you run this 2Table Left Join Nutility while you have a cell selected within an Output Table that what previously created by the Nutility, it will not create a new table, but do a quick check of the selected table’s connection string. It will compare the path in the connection string to the path of the file, and if different, will fix the connection string path for you. So if you move your file from your desktop to a shared drive, for example. You could quickly fix the table by running the Nutility on the table after the move. Then, the file could be refreshed from that location by anyone. If, when the Nutility compared the connection string’s path to the file’s, it had found that they were the same, it will instead offer to help you write a macro that will fix the connection string for you each time macros are enabled in the workbook. This way you could email the file to some, and all they need do is save the file to their desktop, and enable the macro in order to start refreshing the table themselves. Further details on this macro is beyond the scope of this article. However, if you would like more info, feel free to email me at email@example.com
Third, again on the sheets you use as your left and right tables, Excel will ignore anything you have beyond column IV. There is no getting around this, but you should be able to have all the data you need before column IV, as long as you don’t have a lot of blank columns ahead of your data. A good practice would be to start your data in column A on these sheets, since you can’t have anything in the columns outside of your data anyway.
Fourth, and finally, your output table can only include up to 255 columns that come from your left and right tables, you may add more formula columns to the output table though. To say this another way, if you sum up all the columns of data you have on both your left and right table sheets, if the total is more than 255, you will encounter an error — even if you delete some of those columns off the final Output Table. There is no way around this, but 255 columns should be enough…
I’d like to shout-out a quick thank you to my friend Ryu who first showed me this allocation method. Thank you sensei!