This morning I knocked up a quick and dirty “blogmarks”:http://simon.incutio.com/archive/2003/11/24/blogmarks script. Being the control-freak I am I wanted to add categorisation to the blogmarks; but instead of just adding the blogmark (from now on referred to as BM) to one category I wanted multiple categories to be possible.
First thought was the usual ‘1 table for blogmarks; 1 for categories; and one for the blogmark/category relationships’. Usual way, if IMO a bit overcomplicated for such a simple job.
Then I though of “PostgreSQL’s”:http://www.postgresql.org “array support”:http://www.postgresql.org/docs/current/interactive/arrays.html. This would mean only 2 tables were necessary and would group the data more appropriately. Except… something was stirring in the back of my mind. Didn’t this undo some of the normalisation?
From the reading I’ve done PostgreSQL’s arrays aren’t supported as arrays directly in PHP. So it wouldn’t make any difference to using one column in the database table and storing the category ID’s in a comma separated format (which I believe is similar to what PostgreSQL does internally). Except that arrays are much easier to update than changing 17 to 20 in @1,5,17,3,2@.
A bit of googling didn’t turn up any useful information as to whether arrays in a database are considered harmful or not - so what do you guys think?
Oh, and the blogmarks will appear on this site when I can decide where to put them (and how to format them so it doesn’t look like a rip-off of Simon’s site)
4 comments ↓
Relational purists argue that arrays in a table are a violation of the relational model. I’d tend to agree.
In any case, I don’t think using arrays is the right thing to do in your situation. Keep in mind, whenever you design a database schema, you have to think of how it will be used. Given a scheme where you store an array of category IDs in a field, how will you be able to retrieve, say, all posts within a category?
According to the documentation you linked to, it seems *rudimentary* searching within arrays is possible in PostgreSQL, but I think you should heed their warning:
bq. Arrays are not sets; searching for specific array elements may be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale up better to large numbers of elements.
I like the idea of a categorised blogmark too!
If you manage to solve this one, could you share on your blog?
prozac
\ tenuate
tramadol
valium
viagra
vioxx \
xanax
xenical
h
zyban
Besides unecessarily breaking with relational theory, an array type also just seems like unecessarily adding complexity. There’s already a way to establish such a 1-N relationship, and syntax for using it and querying it. Adding an array type creates a second way of doing the same thing with need for additional syntax for all the same operations. All this to save creating an additional table / having to grasp how to join this additional table in your queries to use it. Instead, now you have to remember a bunch of new array operators instead of using the same SQL syntax you already know and use for other things.
This is known as the “law of parsimony”. In any model, or language, you don’t want to keep creating new concepts to enable you do things which you can already do with existing concepts. LESS concepts or constructs in a language or model is a Good Thing. It just like bloating an API with special case function for every little thing you can think of that you’d like to do.
I really don’t understand this dread of creating additional tables that so many SQL users seem to have.
Leave a Comment