Tuesday, February 2, 2010

More on Ordering by Distinct Values ...

This is actually a problem I have been puzzling over for quite a while now. I actually need to do that sort. I wonder if I may have to create another column that has the list of ordering values in it in increasing order so for my example above assuming a letter table and a number table that contains the numbers for letters, the number table would not change but I would add this second column to the letter table:

B 0
A 0, 1
C 2
B 3

But I don't like that idea as it is a denormalization and would require extra maintenance.

I can think of a way to do it with a specified number of subsequent rows to sort by. In your example you order by the minimum value of the number column. If we wanted to order by the minimum, then by the second minimum, then by the third minimum we can use nested queries to select each of those values in different columns and order by them. I have developed query like this but there are some issues. First of all, getting the second and third lowest values requires nested queries themselves so this would require many nested queries (not sure if that is a problem). The number of nested queries increases based on how many levels down you want to sort by. The other problem with this method is that you have to specify how many levels down you want to sort by, you can't just sort by a concatenation of all numbers.

The real world example of this problem actually seems like it would be useful in many situations. Consider a task table, that has a list of tasks for people to accomplish and an assignee table that has a list of people assigned to the task. A task can have many people assigned to it. I want to get a list of tasks sorted by their assignees in alphabetical order so if the joined table looked like this:

Task Assignee
1 John
1 Mark
2 John
3 Mark

The result would be in this order:
2 John
1 John, Mark
3 Mark


Let me know what you think the best approach is.

Well, one way to handle this is to write a User Defined Function that returns a string concatenating distinct Assignees for the Task provided as a parameter. We can use a UDF similar to this one as an example. UDF's such as these are the simplest and most efficient way I have seen to handle concatenation at the database layer, though there are other methods you can try.

So, let's say we have this for a schema and sample data:

create table Tasks (TaskID int primary key, TaskName varchar(10))
go

create table TaskAssignees
( TaskID int references Tasks(TaskID), Assignee varchar(10), primary key (TaskID, Assignee))
go

insert into Tasks
select 1,'Task A' union all
select 2,'Task B' union all
select 3,'Task C'

insert into TaskAssignees
select 1, 'John' union all
select 1, 'Mark' union all
select 2, 'John' union all
select 3, 'Mark' union all
select 3, 'Ed'

We can create a UDF like this:

create Function GetTaskAssignees(@TaskID int)
returns varchar(100)
as
begin
declare @ret varchar(100)
set @ret = ''

select @ret= @ret + ', ' + Assignee
from TaskAssignees
where TaskID = @TaskID
order by Assignee

return substring(@ret,3,100)
end

... and get the output you are looking for like this:

select TaskID, TaskName, dbo.GetTaskAssignees(TaskID) as Assignees
from Tasks
order by Assignees

TaskID TaskName Assignees
----------- ---------- --------------
3 Task C Ed, Mark
2 Task B John
1 Task A John, Mark

So, this should actually work for well, though for large sets of data performance may be an issue. As an added bonus, this handles presentation of the names assigned to each task for you as well.

Depending on the data, however, you may need to concatenate items of a fixed length, padded by spaces, instead of simply comma-separated. This would apply if you are sorting by numeric values, such as:

1,23,45
1,3
6,12,4

Notice that "1,23,25" sorts before "1,3" in the example above, since it is just comparing two strings. To solve this, you'd have to write the UDF to output like this:

1 , 3
1 ,23 ,45
6 ,12 , 4

That way, " 3" (padded to the right with a space) sorts correctly before "23". You could also pad with leading zeroes, or padding after the value instead of before. The trick here is to identify how much padding you need.

Friday, January 29, 2010

Search and Replace SQL Server data in all columns, of all tables

--To replace all occurences of 'America' with 'USA':
EXEC SearchAndReplace 'America', 'USA'
GO

Here is the complete stored procedure code:
(Note: If you only want to search, but not replace, try this procedure instead: How to search all columns of all tables in a database for a keyword?)


CREATE PROC SearchAndReplace
(
@SearchStr nvarchar(100),
@ReplaceStr nvarchar(100)
)
AS
BEGIN

SET NOCOUNT ON

DECLARE @TableName nvarchar(256), @ColumnName nvarchar(128), @SearchStr2 nvarchar(110), @SQL nvarchar(4000), @RCTR int
SET @TableName = ''
SET @SearchStr2 = QUOTENAME('%' + @SearchStr + '%','''')
SET @RCTR = 0

WHILE @TableName IS NOT NULL
BEGIN
SET @ColumnName = ''
SET @TableName =
(
SELECT MIN(QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME))
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE'
AND QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME) > @TableName
AND OBJECTPROPERTY(
OBJECT_ID(
QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME)
), 'IsMSShipped'
) = 0
)

WHILE (@TableName IS NOT NULL) AND (@ColumnName IS NOT NULL)
BEGIN
SET @ColumnName =
(
SELECT MIN(QUOTENAME(COLUMN_NAME))
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = PARSENAME(@TableName, 2)
AND TABLE_NAME = PARSENAME(@TableName, 1)
AND DATA_TYPE IN ('char', 'varchar', 'nchar', 'nvarchar')
AND QUOTENAME(COLUMN_NAME) > @ColumnName
)

IF @ColumnName IS NOT NULL
BEGIN
SET @SQL= 'UPDATE ' + @TableName +
' SET ' + @ColumnName
+ ' = REPLACE(' + @ColumnName + ', '
+ QUOTENAME(@SearchStr, '''') + ', ' + QUOTENAME(@ReplaceStr, '''') +
') WHERE ' + @ColumnName + ' LIKE ' + @SearchStr2
EXEC (@SQL)
SET @RCTR = @RCTR + @@ROWCOUNT
END
END
END

SELECT 'Replaced ' + CAST(@RCTR AS varchar) + ' occurence(s)' AS 'Outcome'
END

Find Text In Store Procedure

CREATE PROCEDURE Find_Text_In_SP
@StringToSearch varchar(100)
AS
SET @StringToSearch = '%' +@StringToSearch + '%'
SELECT Distinct SO.Name
FROM sysobjects SO (NOLOCK)
INNER JOIN syscomments SC (NOLOCK) on SO.Id = SC.ID
AND SO.Type = 'P'
AND SC.Text LIKE @stringtosearch
ORDER BY SO.Name
GO