Wednesday, July 11, 2012
Finding values in one table that are not in another
This site has a very good explanation of NOT IN, LEFT JOIN,where NULL, and NOT EXISTS statements for Finding values in one table that are not in another
Wednesday, May 2, 2012
Matching limited elements using Postgres IN clause
So you have two tables, one a sequence table holds unique sequence ids. The other table hold sequence mutations and has a 1 to many relationship with the sequence table. So for a given sequence id in the mutation table, there are many records, one for each mutation.
Now you have two sets of mutations you are interested in, a primary set A and a secondary set B. You want to know how many sequences contain 1 and only 1 mutation from set A and 1 and only 1 from set B.
select count(*) from (
SELECT distinct a.seq_id
FROM sequence a
JOIN seq_mutations b ON b.seq_id = a.seq_id
JOIN seq_mutations c ON c.seq_id = a.seq_id
WHERE b.mutation IN ('mutA','mutB','mutC')
AND c.mutation IN ('mut1', 'mut2', 'mut3', 'mut4', 'mut5')
GROUP BY a.seq_id
HAVING
count(distinct b.mutation) = 1
AND count(distinct c.mutation) = 1) a
This query gives you exact, fine control over the conditions you want to test for. If you then want to test for 1 and only 1 mutation in set A and 2 and only 2 mutations in set B, just change
count(distinct c.mutation) = 2
in the HAVING clause. This also lets you use > and <, so you can ask for 3 or more mutations by using
count(distinct c.mutation) >= 3
You can also vary the count for b.mutation to adjust the number of mutations allowed from set A, so any combination can now be extracted.
Thanks to Jake for the code help.
Now you have two sets of mutations you are interested in, a primary set A and a secondary set B. You want to know how many sequences contain 1 and only 1 mutation from set A and 1 and only 1 from set B.
select count(*) from (
SELECT distinct a.seq_id
FROM sequence a
JOIN seq_mutations b ON b.seq_id = a.seq_id
JOIN seq_mutations c ON c.seq_id = a.seq_id
WHERE b.mutation IN ('mutA','mutB','mutC')
AND c.mutation IN ('mut1', 'mut2', 'mut3', 'mut4', 'mut5')
GROUP BY a.seq_id
HAVING
count(distinct b.mutation) = 1
AND count(distinct c.mutation) = 1) a
This query gives you exact, fine control over the conditions you want to test for. If you then want to test for 1 and only 1 mutation in set A and 2 and only 2 mutations in set B, just change
count(distinct c.mutation) = 2
in the HAVING clause. This also lets you use > and <, so you can ask for 3 or more mutations by using
count(distinct c.mutation) >= 3
You can also vary the count for b.mutation to adjust the number of mutations allowed from set A, so any combination can now be extracted.
Thanks to Jake for the code help.
Monday, April 30, 2012
Convert Postgres column from text to numeric type
ALTER TABLE foo ALTER COLUMN col TYPE NUMERIC USING col::numeric
You can use this with data in the table. All data must be convertable (no alpha chars). You can also specify numeric formatting here if you like.
You can use this with data in the table. All data must be convertable (no alpha chars). You can also specify numeric formatting here if you like.
Subscribe to:
Posts (Atom)