×

Loading...
Ad by
  • 推荐 OXIO 加拿大高速网络,最低月费仅$40. 使用推荐码 RCR37MB 可获得一个月的免费服务
Ad by
  • 推荐 OXIO 加拿大高速网络,最低月费仅$40. 使用推荐码 RCR37MB 可获得一个月的免费服务

请教一个SQL QUERY问题,比较两个table data and get the difference.

本文发表在 rolia.net 枫下论坛First, what's 'table' called in Chinese?

I have two tables with exactly same schema. Each table has 4 columns, they are id, period, field_name and field_value. In #delta table, I have new/updated data and #previous table has old data. I need to use the #delta data to retrieve any data from #previous where they have same id, period but different field_name. Given the following example, table #delta has two records with id as 1, period as A. My query result should come out as any records in #previous having id=1, period=A, but field_name !='field1' or field_name!='field2'. So, only two records (bottom 2, not 3rd one) from #previous meet the condition. field_name has hundreds of name in it.

I have my first query done. I have duplicated records which is understandable. So, I added distinct after 'select'. Is there faster way to run the query. This sample has 5 rows only. In actualy database, there are half million of rows on each table. Speed is important. I will run the test and post the result later.

Thanks, everyone.


--script one
create table #delta(id int, period varchar(20),field_name varchar(20),field_value float)
insert into #delta values (1,'A','field1',1)
insert into #delta values (1,'A','field2',2)

--script two
create table #previous(id int, period varchar(20),field_name varchar(20),field_value float)
insert into #previous values (2,'C','field2',3)
insert into #previous values (3,'A','field5',4)
insert into #previous values (1,'A','field1',3)
insert into #previous values (1,'A','field5',4)
insert into #previous values (1,'A','field6',1)

--script three
select * from #delta
select * from #previous

--script four
select d.id,d.period,p.field_name,p.field_value
from #delta d, #previous p
where not exists(select 1 from #delta dd
where dd.id=p.id and
dd.period=p.period and
dd.field_name=p.field_name) and
d.id=p.id and
d.period=d.period and
d.field_name!=p.field_name更多精彩文章及讨论,请光临枫下论坛 rolia.net
Report

Replies, comments and Discussions:

  • 工作学习 / 专业技术讨论 / 请教一个SQL QUERY问题,比较两个table data and get the difference.
    本文发表在 rolia.net 枫下论坛First, what's 'table' called in Chinese?

    I have two tables with exactly same schema. Each table has 4 columns, they are id, period, field_name and field_value. In #delta table, I have new/updated data and #previous table has old data. I need to use the #delta data to retrieve any data from #previous where they have same id, period but different field_name. Given the following example, table #delta has two records with id as 1, period as A. My query result should come out as any records in #previous having id=1, period=A, but field_name !='field1' or field_name!='field2'. So, only two records (bottom 2, not 3rd one) from #previous meet the condition. field_name has hundreds of name in it.

    I have my first query done. I have duplicated records which is understandable. So, I added distinct after 'select'. Is there faster way to run the query. This sample has 5 rows only. In actualy database, there are half million of rows on each table. Speed is important. I will run the test and post the result later.

    Thanks, everyone.


    --script one
    create table #delta(id int, period varchar(20),field_name varchar(20),field_value float)
    insert into #delta values (1,'A','field1',1)
    insert into #delta values (1,'A','field2',2)

    --script two
    create table #previous(id int, period varchar(20),field_name varchar(20),field_value float)
    insert into #previous values (2,'C','field2',3)
    insert into #previous values (3,'A','field5',4)
    insert into #previous values (1,'A','field1',3)
    insert into #previous values (1,'A','field5',4)
    insert into #previous values (1,'A','field6',1)

    --script three
    select * from #delta
    select * from #previous

    --script four
    select d.id,d.period,p.field_name,p.field_value
    from #delta d, #previous p
    where not exists(select 1 from #delta dd
    where dd.id=p.id and
    dd.period=p.period and
    dd.field_name=p.field_name) and
    d.id=p.id and
    d.period=d.period and
    d.field_name!=p.field_name更多精彩文章及讨论,请光临枫下论坛 rolia.net
    • table就叫“表”。
      两种方法,该用哪种取决于#delta的大小,你可以自行比较。

      SELECT p.*
      FROM #previous p
      WHERE EXISTS (SELECT 1 FROM #delta d WHERE d.id = p.id AND d.period = p.period)
      AND NOT EXISTS (SELECT 1 FROM #delta d WHERE d.id = p.id AND d.period = p.period AND d.field_name = p.field_name);



      SELECT p.*
      FROM #previous p
      ,(SELECT DISTINCT id,period FROM #delta) d2
      WHERE d2.id = p.id AND d2.period = p.period
      AND NOT EXISTS (SELECT 1 FROM #delta d WHERE d.id = p.id AND d.period = p.period AND d.field_name = p.field_name);
    • 你用的是什么DB? 不同的DB有不同的feature,solution也会不一样。不知道用JOIN是不是会快一点,你可以试试。容我再想想有没有其它tricky点的方法。
      select DISTINCT
      p.id,p.period,p.field_name,p.field_value
      from #delta d
      JOIN #previous p
      ON d.id = p.id
      AND d.period = p.period
      AND d.field_name != p.field_name
      AND d.field_value != p.field_value
      • 嗯,把最后一行去掉就符合他的要求了。
        我又把楼主的问题看了一下,两张表都有50万行,我的第二种方法显然不行了。
    • newkid's two query returns right answer. bdbs's query returns is not that right.
      the middle row (1,'A','field1',3) of #previous should not return. because in #delta table, row 1 has (1,'A','field1',x) already.

      I am using SQL 2005(+sp2).

      I need to dump some data in to do the performance comparing. Will give you update tomorrow.
      • 我审题不清,考虑到DISTINCT的performance不好,看来只有newkid的方法一可行了。
        • DISTINCT 作用于一个大数据集是会有PERFORMANCE问题,不知道楼主对查询结果有个估计没有,是什么数量级的,如果集合不大那问题也不大。这些都不好说,必须实际测试了才知道。
    • too long post. no time to read, but I wonder if "minus" would help in this? do not know if SQL S has a minus though
      • Seems you shot every single post whole day long, no matter what topic related, involve, just deeply involved。。。