Tags in Oracle Nested Tables
Tags are a useful means of associating keyword metadata with a data structure such as an image, a video, audio or a piece of text. There are many ways of storing tags in a database – for example one could store a string of tags separated by a space, comma or semi-colon. Alternatively, one could store tags in a database table and then create another table to hold the list of tags for a given entity. My preferred method of storing tags, in an Oracle database at least, is to use the concept of nested tables.
Nested tables are a type of PL/SQL collection which may be compared to arrays or objects in other programming languages. One of the advantages of using nested tables over other PL/SQL collection types is that they don’t have a fixed size, and they can be stored in a database table (not just in a program unit, like regular PL/SQL tables or associative arrays). The best way to show how nested tables work is to use an example, so let’s dive right in and set up our database for the tags example.
The first thing we want to do is create the nested table type in our database. I’m going to keep it nice and simple and name my type “tags”:
CREATE TYPE tags IS TABLE OF VARCHAR2(50);
If you are familiar with PL/SQL tables, the above should look very similar to the syntax for creating a PL/SQL table, but without the typical “index by binary_integer” bit stuck at the end. With our type created, we can now go ahead and use this in a database table. I’m going to create a table named “post”, which one might use to store blog posts. This table will have three columns – post_id, post_title and post_tags.
CREATE TABLE post (
post_id           NUMBER(11),
post_title       VARCHAR2(50),
post_tags       tags
) NESTED TABLE post_tags STORE AS tags_table;
Great – you now have a database table with a column of type “tags” which is itself a table. The next thing we want to do is insert some records into our new table.
INSERT INTO post(post_id, post_title, post_tags)
VALUES(1, 'First Post', tags('oracle', 'tags', 'example'));
INSERT INTO post(post_id, post_title, post_tags)
VALUES(2, 'Second Post', tags('oracle', 'tags'));
INSERT INTO post(post_id, post_title, post_tags)
VALUES(3, 'Third Post', tags('oracle'));
COMMIT;
As you can see from the above INSERT statements, inserting tags into our nested table is very straightforward, we simply tell Oracle that we are using the type “tags” and pass the VARCHAR2 values we want to store to it. If you try to insert a tag with over 50 characters you will get an error as we defined the tags table as a table of VARCHAR2(50) fields. So what does our new table look like when we go to retrieve data from it? Let’s take a look.
SELECT * FROM post;
You should see output like the following
POST_ID POST_TITLE                                               POST_TAGS ------------ ----------------------------------------------------------- --------- 1 First Post                                                 <Object> 2 Second Post                                                <Object> 3 Third Post                                                 <Object>
Hardly ideal is it? Fortunately, we can use the TABLE function to easily get back a list of tags for a given post quite easily. Try the following SQL statement:
SELECT column_value
FROM TABLE(
SELECT post_tags FROM post WHERE post_id = 1
);
This outputs the following:
COLUMN_VALUE -------------------------------------------------- oracle tags example
Try the statement again, but this time pass in the post_id value for another row. This is useful, as it gives us a means of getting the tag data out of the database, but what if we want to get back all the tags when retrieving the other columns in the table? This requires a bit more work, but it’s not exactly complex:
SELECT p.post_id, p.post_title, t.column_value
FROM post p, TABLE(
SELECT post_tags FROM post WHERE post_id = p.post_id
) t;
The above statement should produce the following:
POST_ID POST_TITLE                                             COLUMN_VALUE ------------ ------------------------------------------------------- --------------- 1 First Post                                             oracle 1 First Post                                             tags 1 First Post                                             example 2 Second Post                                           oracle 2 Second Post                                            tags 3 Third Post                                            oracle
A step in the right direction, but the post_id and post_title are duplicated for each tag for that record. Surely there’s a way of outputting all the tags on a single row, joining them together with a delimiting character such as a comma? There are a number of ways of doing this, from using a stored function to using 11g’s WITHIN GROUP feature. The method with the least code that works on 9i and above is to use the XMLAGG function to produce the concatenated result. The SQL to achieve the desired result in this manner is as follows:
SELECT p.post_id, p.post_title,
RTRIM(XMLAGG(XMLELEMENT(e, t.column_value || ',')).EXTRACT('//text()'), ',') tags
FROM post p, TABLE(
SELECT post_tags FROM post WHERE post_id = p.post_id
) t
GROUP BY p.post_id, p.post_title;
This produces the following result:
POST_ID POST_TITLE                              TAGS ------------ ---------------------------------------- ----------------------- 1 First Post                              oracle,tags,example 2 Second Post                             oracle,tags 3 Third Post                              oracle
So far we haven’t seen any particular reason why one would necessarily use a nested table to store tags over any other method. For me, the primary advantage is that it makes it very simple to perform group functions on the tags – making it ridiculously easy to generate tag clouds based on how many times a tag is used. In the above example, we can easily see that the tag “oracle” is used 3 times, “tags” is used twice and “example” only once. But in a scenario where you have many posts and many, many tags, it is impossible to do this counting in your head. Let’s take a look at just how easy it is to do this counting with our nested table.
SELECT t.column_value, COUNT(*)
FROM post p, TABLE(
SELECT post_tags FROM post WHERE post_id = p.post_id
) t
GROUP BY t.column_value;
And the output:
COLUMN_VALUE                                        COUNT(*) -------------------------------------------------- ---------- tags                                                       2 example                                                    1 oracle                                                     3
Want to order the results? Simply add an ORDER BY clause and Bob’s your uncle.
SELECT t.column_value, COUNT(*)
FROM post p, TABLE(
SELECT post_tags FROM post WHERE post_id = p.post_id
) t
GROUP BY t.column_value
ORDER BY 2 DESC;
The result:
COLUMN_VALUE                                        COUNT(*) -------------------------------------------------- ---------- oracle                                                     3 tags                                                       2 example                                                    1
For me, this is the main reason I store tags in a nested table – but there are other advantages, too. What if you are editing a post and you want to change one of the tags? Nested tables make this simple:
UPDATE TABLE(SELECT post_tags FROM post WHERE post_id = 1) SET column_value = 'sample' WHERE column_value = 'example'
Similarly, if you wanted to delete a tag from a row, this is also very straightforward:
DELETE FROM TABLE(
SELECT post_tags FROM post WHERE post_id = 1
)
WHERE column_value = 'tags';
Nested tables are a powerful PL/SQL collection type, and you can do even more powerful things with them if you start using PL/SQL procedures and functions. But the beauty about nested tables for me is how powerful and flexible they are using nothing but raw SQL statements. For further information about nested tables and other PL/SQL collection types, see the following resources:
- http://www.developer.com/db/article.php/10920_3379271_1/Oracle-Programming-with-PLSQL-Collections.htm
- http://www.dba-oracle.com/t_display_multiple_column_values_same_rows.htm
- http://www.devshed.com/c/a/Oracle/Database-Interaction-with-PLSQL-Nested-Tables/
- http://soft.buaa.edu.cn/oracle/bookshelf/Oreilly/prog2/ch19_01.htm
- http://sql-plsql.blogspot.com/2007/05/oracle-plsql-nested-tables.html
- http://download.oracle.com/docs/cd/B10501_01/appdev.920/a96624/05_colls.htm
- http://www.smart-soft.co.uk/Oracle/oracle-plsql-tutorial-part-9.htm
Any responses to those who say that nested tables are just parent/child tables in disguise, and that it actually makes it harder to deal with the data?
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:8135488196597
chris marx
14 Oct 10 at 2:22 pm
It’s a fair point Chris. I guess what I like about them is it allows you to use Oracle to store data much like a document-oriented database (e.g. CouchDB) would – with everything related to a record self-contained.
Joe Lennon
18 Oct 10 at 9:09 am
well, i agree, that’s what got me on them too
but the dba’s aren’t sold yet…
chris marx
18 Oct 10 at 2:08 pm
Well strictly speaking they are probably right. I don’t believe nested tables are actually stored in the same table as the parent table, so there are issues with the argument that they have any particular benefit over traditional related tables. Some might argue that if you want to store self-contained document-oriented records, you should store the data in an XML column or store a big string of JSON and use something like PL/JSON to work with it.
Personally, I try not to get too bogged down about what certain people think and just go with whatever works right for me, and I like using the nested tables solution for tags, whether or not Tom Kyte thinks I should.
Joe Lennon
18 Oct 10 at 3:21 pm