I was thinking about algorithms yesterday afternoon – yes, I am a strange one – and one thought led to another and soon I found myself contemplating palindromes – words or phrases that read the same forwards and backwards, such as “racecar” or Napoleon’s famous – albeit apochryphal – “Able was I ere I saw Elba!” Specifically, I was wondering what would be the most efficient way to test a string’s palindromicity using T-SQL.
Immediately I realized that this algorithm will need to accomplish two different things. I first need to remove all non-alphabetic characters from the string I am testing, because while “able was I ere I saw Elba” is palindromic even leaving the spaces intact, this will not work for other well-known palindromes such as “A man, a plan, a canal, Panama!” Then the second task is to check that the remaining string is the same front-to-back as it is back-to-front.
With the help of Elder’s Dead Roots Stirring album I set out to find the most efficient T-SQL code to accomplish this task. My plan was to avoid resorting to Google for the answer, but perhaps in a future post I will go back and compare my solution to the best one I can find online. For this first post in the series I will tackle only the first task of removing the non-alphabetic characters from the string.
I’ll readily admit to not being well versed in the use of regular expressions, so in order to meet my Google-free objective I’ll have to find another solution for now. Let’s start by seeing what we can do with the famed “A man, a plan, a canal, Panama!” Because I know exactly which characters are in this string, I can use the brute-force method of nested REPLACE() functions to parse them out – see line 11 below. (You may want to use the Toggle Line Wrap or Open Code In New Window options to see it better.)
-- Fun With Palindromes Query 1.01 -- This is the brute-force method of character removal. -- Use this method only if you have a limited number of distinct, -- known characters to remove DECLARE @Phrase VARCHAR(50), @Phrase_LettersOnly VARCHAR(50); SET @Phrase = 'A man, a plan, a canal, Panama!'; SET @Phrase_LettersOnly = REPLACE(REPLACE(REPLACE(@Phrase, ' ', ''), ',', ''), '!', ''); PRINT @Phrase_LettersOnly;
Of course I may not always have the luxury of knowing the contents of the string beforehand. I do know that the characters I want to keep will fall within the range of A-Z (ASCII values 65-90) or a-z (ASCII values 97-122). My plan is to step through the string one character at a time, but instead of using a WHILE loop, I’ll instead use the tally table (a table of sequential numbers) in my Admin database like so:
-- Fun With Palindromes Query 1.02 -- Using a tally table to separate out the characters in the string -- and keeping only those that fall in the desired ASCII range. USE Admin; GO DECLARE @Phrase VARCHAR(50), @Phrase_LettersOnly VARCHAR(50); SET @Phrase = 'A man, a plan, a canal, Panama!'; SELECT SUBSTRING(@Phrase, Tl.SequenceNumber, 1) FROM dbo.Tally Tl WHERE Tl.SequenceNumber <= LEN(@Phrase) AND (ASCII(SUBSTRING(@Phrase, Tl.SequenceNumber, 1)) BETWEEN 65 AND 90 OR ASCII(SUBSTRING(@Phrase, Tl.SequenceNumber, 1)) BETWEEN 97 AND 122) ORDER BY Tl.SequenceNumber;
If for some reason you don’t have or are not able to add a tally table to your system, you can use the ROW_NUMBER() function against the sys.objects DMO instead:
-- Fun With Palindromes Query 1.03 -- Using ROW_NUMBER() against the sys.objects DMO. SELECT SequenceNumber = ROW_NUMBER() OVER (ORDER BY name) FROM sys.objects;
You can put this code into a common table expression and combine it with the second query:
-- Fun With Palindromes Query 1.04 -- Using ROW_NUMBER() against the sys.objects in a CTE instead -- of a physical tally table. DECLARE @Phrase VARCHAR(50), @Phrase_LettersOnly VARCHAR(50); SET @Phrase = 'A man, a plan, a canal, Panama!'; WITH Tally AS ( SELECT SequenceNumber = ROW_NUMBER() OVER (ORDER BY name) FROM sys.objects ) SELECT SUBSTRING(@Phrase, Tl.SequenceNumber, 1) FROM Tally Tl WHERE Tl.SequenceNumber <= LEN(@Phrase) AND (ASCII(SUBSTRING(@Phrase, Tl.SequenceNumber, 1)) BETWEEN 65 AND 90 OR ASCII(SUBSTRING(@Phrase, Tl.SequenceNumber, 1)) BETWEEN 97 AND 122) ORDER BY Tl.SequenceNumber;
Of course, separate rows for each character don’t help us very much. Let’s use the CONCAT() function to put them back into a single string:
-- Fun With Palindromes Query 1.05 -- Using CONCAT() to put the letters-only string back together. USE Admin; GO DECLARE @Phrase VARCHAR(50), @Phrase_LettersOnly VARCHAR(50); SET @Phrase = 'A man, a plan, a canal, Panama!'; SELECT @Phrase_LettersOnly = CONCAT(@Phrase_LettersOnly, SUBSTRING(@Phrase, Tl.SequenceNumber, 1)) FROM dbo.Tally Tl WHERE Tl.SequenceNumber <= LEN(@Phrase) AND (ASCII(SUBSTRING(@Phrase, Tl.SequenceNumber, 1)) BETWEEN 65 AND 90 OR ASCII(SUBSTRING(@Phrase, Tl.SequenceNumber, 1)) BETWEEN 97 AND 122) ORDER BY Tl.SequenceNumber; PRINT @Phrase_LettersOnly;
This method works great to parse a single string, but I will have a whole table of potential palindromes to parse. Historically, I would usually use the STUFF() function with FOR XML PATH to reassemble the strings like this:
-- Fun With Palindromes Query 1.06 -- Using STUFF() and FOR XML PATH to put the letters-only string -- back together. USE Admin; GO DECLARE @Phrase VARCHAR(50), @Phrase_LettersOnly VARCHAR(50); SET @Phrase = 'A man, a plan, a canal, Panama!'; SELECT @Phrase_LettersOnly = STUFF(( SELECT SUBSTRING(@Phrase, Tl.SequenceNumber, 1) FROM dbo.Tally Tl WHERE Tl.SequenceNumber <= LEN(@Phrase) AND (ASCII(SUBSTRING(@Phrase, Tl.SequenceNumber, 1)) BETWEEN 65 AND 90 OR ASCII(SUBSTRING(@Phrase, Tl.SequenceNumber, 1)) BETWEEN 97 AND 122) ORDER BY Tl.SequenceNumber FOR XML PATH('')), 1, 0, ''); PRINT @Phrase_LettersOnly;
I’ve always found using this method to be a bit non-intuitive and fiddly. Thankfully SQL Server 2017 has added a new function called STRING_AGG() that we can use instead:
-- Fun With Palindromes Query 1.07 -- Using STRING_AGG() to put the letters-only string back together. USE Admin; GO DECLARE @Phrase VARCHAR(50), @Phrase_LettersOnly VARCHAR(50); SET @Phrase = 'A man, a plan, a canal, Panama!'; SELECT @Phrase_LettersOnly = STRING_AGG(SUBSTRING(@Phrase, Tl.SequenceNumber, 1), '') WITHIN GROUP (ORDER BY Tl.SequenceNumber) FROM dbo.Tally Tl WHERE Tl.SequenceNumber <= LEN(@Phrase) AND (ASCII(SUBSTRING(@Phrase, Tl.SequenceNumber, 1)) BETWEEN 65 AND 90 OR ASCII(SUBSTRING(@Phrase, Tl.SequenceNumber, 1)) BETWEEN 97 AND 122); PRINT @Phrase_LettersOnly;
Wow, that’s a lot of queries, so I’ll bring part one of this series to a close. I look forward to seeing some other solutions in the comments below. In part two, we’ll compare the performance of these methods across both a single string, and a whole table of them!