Performance Tuning for Exabyte Queries

While NoSQL databases have definitely made their mark and have an important role in applications, there's also still a place for RDBMSes. The key advantage of an RDBMS is that, with a well normalized schema, any arbitrary query is possible, and instead of optimizing the query, you optimize the database itself to ensure you hit your performance goals- indexes, statistics, materialized views, etc..

The reality, of course, is wildly different. While the execution plan used by the database shouldn't be dependent upon how we write the query, it frequently is, managing statistics and indexes is surprisingly hard, and when performance problems crop up, without the right monitoring, it can be difficult to track down exactly which query is causing the problem.

Which brings us to this query, which TJ found while analyzing a performance problem.

select Min(the.moddate) "ModifiedDate" From T_91CDDC57 what , T_91CDDC57 the , T_91CDDC57 f where f.rdate > sysdate-1095;

First, let's just congratulate whoever named the table T_91CDDC57. I assume that's generated, and presumably so was this query. There's clearly a bug- there's no reason to have the same table in the FROM clause three times, when we just want to find the earliest moddate.

And that's the problem. T_91CDDC57 isn't a particularly large table. It's no pipsqueak- at 4.5M rows and 34M of data, it's certainly got some heft, but it's no giant, either. But that's 4.5M rows which have to be joined to 4.5M rows with no join condition, and then that gets joined to 4.5M rows again with no join condition.

Here's the explain plan of how this query executes:

----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 16 | 19P (1)| | 1 | SORT AGGREGATE | | 1 | 16 | | | 2 | MERGE JOIN CARTESIAN | | 18E| 15E| 19P (1)| | 3 | MERGE JOIN CARTESIAN | | 4328G| 31T| 5321M (1)| |* 4 | TABLE ACCESS FULL | T_91CDDC57 | 959K| 7499K| 18967 (2)| | 5 | BUFFER SORT | | 4509K| | 5321M (1)| | 6 | INDEX FAST FULL SCAN| T_91CDDC57_TYPE_INDEX | 4509K| | 5544 (1)| | 7 | BUFFER SORT | | 4509K| 34M| 19P (1)| | 8 | INDEX FAST FULL SCAN | T_91CDDC57_MODDATE_INDEX | 4509K| 34M| 4410 (1)| -----------------------------------------------------------------------------------------------

A few notable entries here. Line 4 does a TABLE ACCESS FULL. This is the f iteration of our table, and you can see that it pulls in just 959K rows thanks to our where clause. On line 8, you can see that it scans the T_91CDDC57_MODDATE_INDEX- it's using that index to sort so we can find the Min(the.moddate). You can also see that it touches 4509K rows. Line 6, also for 4509K rows, is our what access in the query.

Once we've accessed our three piles of data, we have to connect them. Since there's no ON or WHERE clause that links the tables, this connects each row from each table with each row in each other table. And you can see on line 3, where we join the first pair of tables, we suddenly have 4.3T rows, and a total of 31 terabytes of data. And when we join the third table it on line 2, that bloats to 18,000,000,000,000,000,000 rows and 15 exabytes of data.

TJ says:

Processing over 15 exabytes of information probably may have something to do with the performance…

Yeah, probably.

[Advertisement] Otter - Provision your servers automatically without ever needing to log-in to a command prompt. Get started today!

CodeSOD: A Ritual Approach

Frequent contributor Russell F stumbled across this block, which illustrates an impressive ability to make a wide variety of bad choices. It is written in C#, but one has the sense that the developer didn't really understand C#. Or, honestly, programming.

if (row["Appointment_Num"].ToString() == row["Appointment_Num"].ToString()) { bool b1 = String.IsNullOrEmpty(results.messages[0].error_text); if (b1 == true) { row["Message_Id"] = ($"{results.messages[0].message_id}"); row["message_count"] = ($"{results.message_count[0] }"); row["message_cost"] = ($"{results.messages[0].message_price }"); row["error_text"] = ($"{results.messages[0].error_text }"); row["network"] = ($"{results.messages[0].network }"); row["to"] = ($"{results.messages[0].to }"); row["status"] = ($"{results.messages[0].status }"); row["communicationtype_num"] = ($"1"); row["Sent_Date"] = (String.Format("{0:yyyy-MM-dd hh:mm:ss}", DateTime.Now)); row["Sent_Message"] = row["Sent_Message"].ToString(); } if (b1 == false) { row["Message_Id"] = "0000000000000000"; row["message_count"] = ($"{results.message_count[0] }"); row["message_cost"] = ($"{results.messages[0].message_price }"); row["error_text"] = ($"{results.messages[0].error_text }"); row["network"] = ($"{results.messages[0].network }"); row["to"] = ($"{results.messages[0].to }"); row["status"] = ($"{results.messages[0].status }"); row["communicationtype_num"] = ($"1"); row["Sent_Date"] = (String.Format("{0:yyyy-MM-dd hh:mm:ss}", DateTime.Now)); row["Sent_Message"] = row["Sent_Message"].ToString(); } }

Let's just start on the first line. This entire block is surrounded by a condition: row["Appointment_Num"].ToString() == row["Appointment_Num"].ToString(). If appointment num, as a string, matches appointment num, as a string, we can execute this code.

Inside of that block, we check to see if the error message IsNullOrEmpty. If it is, we'll turn it into a database row. If it's not, we'll also turn it into a database row, but with a hard-coded ID. At first glance, it's weird that they assign the IsNullOrEmpty check to a variable, but when you see that this code is written as two conditionals, instead of an if/else, you realize they forgot that else existed. They didn't want to do the IsNullOrEmpty check twice, so they stuffed it into a boolean.

It's also worth noting that the only difference between the two branches is the Message_Id, so there's a lot of duplicated code in there.

And that duplicated code is its own pile of WTFs. Clearly, the database itself is stringly-typed, as everything gets converted into a string. We mostly avoid using ToString, though, and instead use C#'s string interpolation. Which is'nt really a problem, but most of these fields are already strings. error_text, network, to, status, all strings. And `row["Sent_Message"] is a string, too- and we convert it to a string and store it in the same field.

After all that, I barely have the energy to wonder why they wrapped all of those assignments in parentheses. I suspect they misunderstood C# enough to think it's necessary. This whole thing has the vibe of "programming by incantation and ritual"- if we get the symbols in the right order and express them the right way, it will work, even if we don't understand it.

[Advertisement] Continuously monitor your servers for configuration changes, and report when there's configuration drift. Get started with Otter today!

CodeSOD: Low (Quality) Code

Like the tides, the popularity of low-code development environments comes in ebbs and flows. With each cycle, the landscape changes, old tools going away and new tools washing up on shore. One one hand, democratizing access to technology is good, on the other, these tools inevitably fail to actually do that. Instead, we get mission critical developed by people who don't understand how to develop, and are thus fragile, and released on platforms that are almost certainly going to be on legacy support in the next cycle.

I don't want to imply that low-code tools are automatically bad, or insult non-developers who want to give developing in a low-code environment a shot, though. Especially when professional developers can't really do any better.

John F's company has adopted one of the new tools in this cycle of low-code tools, Microsoft Power Apps, using the Power FX scripting language. One of the screens in their application needs to display a table of sales data, organized by month. This is how one of their developers decided to generate the list of months to draw:

          22,23,24,25,26,27,28,29], 17)

These numbers are meant to be an offset from the current month. The original developer wasn't sure how many months would be correct for their application, and wanted to experiment. This was their solution: hard-code a list that's longer than you'd reasonably want, and take the FirstN, changing N until it looks right.

Of course, "generating a sequential list of numbers of a certain length" is one of those infamous solved problems. In the case of Power FX, it's solved via the sequence function. Sequence(17, 0) would accomplish the same result.

[Advertisement] Utilize BuildMaster to release your software with confidence, at the pace your business demands. Download today!

CodeSOD: An Hourly Rate

When someone mentioned to Abraham, "Our product has an auto-sync feature that fires every hour," Abraham wasn't surprised. He was new to the team, didn't know the code well, but an auto-sync back to the server sounded reasonable.

The approach, however, left something to be desired.

syncTimer = new Timer(1000); syncTimer.Elapsed += new System.Timers.ElapsedEventHandler(syncTimer_Elapsed); syncTimer.Enabled = true; syncTimer.AutoReset = true; //... void syncTimer_Elapsed(object sender, System.Timers.ElapsedEventArgs e) { if (DateTime.Now.Second == 0 && DateTime.Now.Minute == 0) { //... do what you have to do } }

The .NET Timer object does exactly what you expect. In this case, 1,000 milliseconds after getting Enabled, it fires an event. Because AutoReset is true, it will fire that event every 1,000 milliseconds.

Now, this is supposed to trigger the auto-sync once every hour, but instead it fires once a second. On that second, we check the time- if we're on the first second of the first minute in the hour, then we fire the auto-sync operations.

There are a few problems with this. The first, and arguably most minor, is that it's just silly. Firing every second turns this timer into an "Are we there yet? Are we there yet? Are we there yet?" nuisance. It makes more sense to set the timer of 3600000 milliseconds and just have it fire once an hour.

The second problem is that this means nearly every client is going to sync at roughly exactly the same time. All of the load is going to spike all at once, at the top of the hour, every hour. That's manageable, but there's no business requirement that the sync happens at the top of the hour- any time would be fine, it should just only happen once in any given hour.

But the third problem is that this code doesn't guarantee that. This is .NET code running on Windows, and the Windows system clock can have a variety of different resolutions depending on version and configuration. In newer server versions, it could be as small as 1ms, if the sysadmin configured it for high accuracy. But this ran on clients, running desktop OSes, where the number would be closer to 15.6ms- which doesn't divide evenly into 1000ms. Which means this interval isn't going to fire exactly 1000ms apart- but 1005ms apart. Depending on what the OS is doing, it might drift even farther apart than that. But the key point is, some hours, this timer won't fire in the first second of the first minute.

The good news is that fixing this code by converting it to just fire once an hour at any time is a lot less than an hour's worth of work. Validating that in testing, on the other hand…

[Advertisement] ProGet’s got you covered with security and access controls on your NuGet feeds. Learn more.

Error'd: Counting to One

Two of today's ticklers require a little explanation, while the others require little.

Kicking things off this week, an anonymous reporter wants to keep their password secure by not divulging their identity. It won't work, that's exactly the same as my Twitch password. "Twitch seems to be split between thinking whether my KeePass password is strong or not," they wrote. Explanation: The red translates to "This password is too easy to guess", while the green 'Stark' translates as "you've chosen a very good password indeed."



Stymied in an attempted online drugs purchase, Carl C. sums up: "Apparently a phone number is a number. I'm waiting for them to reject my credit card number because I typed it with dashes." We here at TDWTF are wondering what they'd make of a phone number like 011232110330.



Unstoppered Richard B. tops Carl's math woes, declaring "hell, after a shot of this I can't even *count* to one, let alone do advanced higher functions!"



Regular contributor Peter G. highlights "This eBay auction will ship to anywhere in the UK except the bits of the UK that are in Bolivia, Liberia, Turkmenistan, Venezuela, and a few others." Pedantically, Peter, there is no error here, unless there is in fact some bit of the UK somewhere in Sierra Leone.



Pseudonymous spelunker xyzzyl murmurs "Something tells me the rest of videos and movies is also called undefined." Don't go in there! That's a horror flick, my friend.



[Advertisement] ProGet’s got you covered with security and access controls on your NuGet feeds. Learn more.

CodeSOD: Joining the Rest of Us

Using built-in methods is good and normal, but it's certainly boring. When someone, for example, has a list of tags in an array, and calls string.Join(" ", tags), I don't really learn anything about the programmer as a person. There's no relationship or connection, no deeper understanding of them.

Which, let's be honest, is a good thing when it comes to delivering good software. But watching people reinvent built in methods is a fun way to see how their brain works. Fun for me, because I don't work with them, probably less fun for Mike, who inherited this C# code.

public List<string> Tags {get; set;} /// <summary> /// Helper function to convert a tag list to a space-delimited string representation /// </summary> /// <returns>the tags as a string separated by a space</returns> public string ToSpaceDelimitedString() { return ToDelimitedString(' '); } /// <summary> /// Helper function to convert a tag list to a delimited string representation /// </summary> /// <param name="delimiter">the delimiter to insert between the tags</param> /// <returns>the tags as a string separated by the specified delimiter</returns> private string ToDelimitedString(char delimiter) { StringBuilder delimitedTags = new StringBuilder(); foreach (string tag in Tags) { delimitedTags.AppendFormat("{0}{1}", delimitedTags.Length > 0 ? delimiter.ToString() : string.Empty, tag) ; } return delimitedTags.ToString(); }

It's important to note that ToDelimitedString is only called by ToSpaceDelimitedString, which starts us off with a lovely premature abstraction. But what I really love about this, the thing that makes me feel like I'm watching somebody's brain work, is their approach to making sure they don't have leading or training delimiters.

delimitedTags.AppendFormat("{0}{1}", delimitedTags.Length > 0 ? delimiter.ToString() : string.Empty, tag)

On the first run of the loop, delimitedTags is empty, so we append string.Empty, tag- so just tag. Every other iteration of the loop, we append the delimiter character. I've seen lots of versions of solving this problem, but I've never seen this specific approach. It's clever. It's not good, but it's clever.

And, as is good practice, it's got a unit test:

[Test] public void ToSpaceDelimitedString() { TagList list = new TagList(_blogKey); string expected = "tag1 tag2 tag3"; foreach (string tag in expected.Split(' ')) { list.Add(tag); } string actual = list.ToSpaceDelimitedString(); Assert.AreEqual(expected, actual, "ToSpaceDelimitedString failed"); }

What's interesting here is that they know about string.Split, but not string.Join. They're so close to understanding none of this code was needed, but still just a little too far away.

[Advertisement] ProGet’s got you covered with security and access controls on your NuGet feeds. Learn more.

CodeSOD: Supporting Standards

Starting in the late 2000s, smartphones and tablets took off, and for a lot of people, they constituted a full replacement for a computer. By the time the iPad and Microsoft Surface took off, every pointy-haired-boss wanted to bring a tablet into their meetings, and do as much work as possible on that tablet.

Well, nearly every PHB. Lutz worked for a company where management was absolutely convinced that tablets, smartphones, and frankly, anything smaller than the cheapest Dell laptop with the chunkiest plastic case was nothing more than a toy. It was part of the entire management culture, led by the CEO, Barry. When one of Lutz's co-workers was careless enough to mention in passing an article they'd read on mobile-first development, Barry scowled and said "We are a professional software company that develops professional business software."

Back in the mid 2010s, their customers started asking, "We love your application, but we'd love to be able to access it from our mobile devices," Barry's reply was: "We should support standards. The standard is Microsoft Windows."

"Oh, but we already access your application on our mobile devices," one of the customers pointed out. "We just have to use the desktop version of the page, which isn't great on a small screen."

Barry was livid. He couldn't take it out on his customers, not as much as he wanted to, but he could "fix" this. So he went to one of his professional software developers, at his professional software company, and asked them to professionally add the following check to their professional business software:

Public Sub OnActionExecuting(filterContext As ActionExecutingContext) Implements IActionFilter.OnActionExecuting Dim userAgent As String = filterContext.HttpContext.Request.UserAgent If Not userAgent.Contains("Windows NT") OrElse userAgent.Contains("; ARM;") Then filterContext.Result = New ContentResult With {.Content = "Your operating system is not supported. Please use Microsoft Windows."} End If End Sub

Filtering users based on User-Agent strings is a bad idea in general, requiring it to contain "Windows NT" is foolish, but banning UA-strings which contain "ARM" is pure spite. It was added specifically to block, at the time, Windows RT- the version of Windows built for the Surface tablet.

There's no word from Lutz about which lasted longer: this ill-conceived restriction or the company itself.

[Advertisement] ProGet’s got you covered with security and access controls on your NuGet feeds. Learn more.

CodeSOD: Like a Tree, and…

Duncan B was contracting with a company, and the contract had, up to this point, gone extremely well. The last task Duncan needed to spec out was incorporating employee leave/absences into the monthly timesheets.

"Hey, can I get some test data?" he asked the payroll system administrators.

"Sure," they said. "No problem."

{ "client": "QUX", "comp": "FOO1", "employee": "000666", "employeename": { "empname": "GOLDFISH, Bob MR", "style": "bold" }, "groupname": "manager GOLDFISH, Bob MR", "drillkey": { "empcode": { "companyid": "FOO1", "employeeid": "000666" } }, "empleaves": { "empleave": [ { "empcode": { "companyid": "FOO1", "employeeid": "000333" }, "name": "AARDVARK, Alice MS", "candrill": 0, "shortname": "AARDVARK, Alice", "subposition": "", "subpositiontitle": "", "leavedays": { "day": [ "","","","","","","","","AL","","","","","", "","","","","","","","","","","","","","", "","","","" ] } }, { "empcode": { "companyid": "FOO1", "employeeid": "000335" }, "name": "AARDWOLF, Aaron MR", "candrill": 0, "shortname": "AARDWOLF, Aaron", "subposition": "", "subpositiontitle": "", "leavedays": { "day": [ "","","","","","","","","","","","","","", "","","","","","","","","","","","","","", "","" ] } } ] } } }

Well, there were a few problems. The first of which was that the admins could provide test data, but they couldn't provide any documentation. It was, of course, the leavedays field which was the most puzzling for Duncan. On the surface, it seems like it should be a list of business days within the requested range. If an employee was absent one day, it would get marked with a tag, like "AL", presumably shorthand for "allowed" or similar.

But that didn't explain why "AARDWOLF Aaron" had fewer days that "AARDVARK Alice". Did the list of strings somehow tie back to whether the employee were scheduled to work on a given day? Did it tie to some sort of management action? Duncan was hopeful that the days lined up with the requested range in a meaningful way, but without documentation, it was just guessing.

For Duncan, this was… good enough. He just needed to count the non-empty strings to drive his timesheets. But he feared for any other developer that might want to someday consume this data.

Duncan also draws our attention to their manager, "GOLDFISH, Bob MR", and the "style" tag:

I'm fairly sure that's a hint to the UI layer, rather than commentary on the Mr. Goldfish's management style.

[Advertisement] Continuously monitor your servers for configuration changes, and report when there's configuration drift. Get started with Otter today!

CodeSOD: Price Conversions

Russell F has an object that needs to display prices. Notably, this view object only ever displays a price, it never does arithmetic on it. Specifically, it displays the prices for tires, which adds a notable challenge to the application: not every car uses the same tires on the front and rear axles. This is known as a "staggered fitment", and in those cases the price for the front tires and the rear tires will be different.

The C# method which handles some of this display takes the price of the front tires and displays it quite simply:

sTotalPriceT1 = decTotalPriceF.ToString("N2");

Take the decTotalPriceF and convert it to a string using the N2 format- which is a number with thousands separators and two digits behind the decimal place. So this demonstrates that the developer responsible for this code understands how to format numbers into strings.

Which is why it's odd when, a few lines later, they do this, for the rear tires:

sTotalPriceT2 = decimal.Parse(decTotalPriceR.ToString("F2")).ToString("N2");

We take the price, convert it to a string without thousands separators, then parse it back into a decimal and then convert it to a string with thousands separators.

Why? Alone, this line would just be mildly irksome, but when it's only a few lines below a line which doesn't have this kind of ridiculousness in it, the line just becomes puzzling.

But the puzzle doesn't end. sTotalPriceT1 and sTotalPriceT2 are both string variables that store the price we're going to display. Because this price information may need to be retained across requests, though, someone decided that the prices also need to get stored in a session variable. In another method in the same class:

Session["FOS_TPriceF"] = bStaggered ? decimal.Parse(sTotalPriceT1).ToString("N2") : null; Session["FOS_TPriceR"] = bStaggered ? decimal.Parse(sTotalPriceT2).ToString("N2") : null;

Once again, we're taking a string in a known format, turning it back into the base numeric type, then formatting back to the format it already was. And I suppose it's possible that some other bit of code may have modified the instance variables sTotalPriceTN and broken the formatting, but it seems to me the solution is to not store numbers as strings and just format them at the moment of display.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!

Error'd: Money for Nothin'

... and gigs for free.

"Apple is magical," rhapsodizes music-lover Daniel W.



Meanwhile, in the overcast business district of Jassans-Riottier, where the stores are all closed but there's a bustle in the hedgerow, local resident Romain belts "I found the Stairway to Heaven just 100m from my house!"



Yes, there are two paths you can go by.



But in the long run ... you won't get there on any of these buses, shared by Alex Allan, wailing "you wait for one, and XXX all come together!"



Nor on this train of the damned from Finn Jere


No, it's very clear the mandatory mode is Zeppelin.

~ ~
[Advertisement] Keep the plebs out of prod. Restrict NuGet feed privileges with ProGet. Learn more.