SQL

From APIDesign

(Difference between revisions)
Jump to: navigation, search
(Internet Age)
Current revision (09:27, 10 September 2010) (edit) (undo)
 
(11 intermediate revisions not shown.)
Line 13: Line 13:
== Internet Age ==
== Internet Age ==
-
[[TBD]]: Why [[SQL]] is not well suited for the internet age...
+
Over the (successful) years many [[good]] practices for empowering [[SQL]] in storing data has been invented. We know how to [[wikipedia:Database_normalization|normalize a database]] so it is suitable for general purpose querying. However, read the previous sentence once again. It says ''general purpose querying''. Do you think that in the internet age we need ''general purpose querying''?
-
<comments>
+
Let's look at example of [[Bugzilla]]. [[Bugzilla]] is a bug tracking system backed by an [[SQL]] database. Does it help if the database is [[wikipedia:Database_normalization|normalized]]? No, not in the most common case. The majority of pages displayed by [[Bugzilla]] just show status of a single bug. Such status page includes information about the bug itself, about the list of people interested in it, about a reporter, it shows list of attachments and comments associated with the issue. Do you know what this means in terms of [[SQL]]?
-
I'd go with the idea that MDX are far superior for actual data mining and data-model construction. SQL does not natively represent complex data structures well. That said, SQL will probably remain around for quite some time -- it is easy, known, and fast.
+
 
-
</comments>
+
<source lang="sql">
 +
SELECT few_lines FROM ISSUES, PEOPLE, COMMENTS, ATTACHEMENTS;
 +
</source>
 +
 
 +
Basically the whole universe of data spread over various tables (because it is good habit to have [[wikipedia:Database_normalization|normalized database]]) is joined into a giant table. The select then filters tiny piece of such data related to the desired issue.
 +
 
 +
This is complete ''bulldozer'' approach - a form of [[cluelessness]]. Yes, we can use nice [[SQL]] language for doing [[wikipedia:relational algebra|relation algebra]], but the [[performance]] completely sucks.
 +
 
 +
If the details of a single issue were stored in a single file (possibly under a version control system to keep history), it would be enough to read such one file to render the [[HTML]] for the whole issue page.
 +
 
 +
Somehow over the course of time we have forgotten that [[SQL]] is good for generating reports and started to use it for storing any kind of data. As far as I can tell, those who design heavily loaded websites have already realized that and are seeking for alternatives (like the trivial one file per bug, or [[Lucene]], etc.). It is clear that effective web sites need something else than ''SELECT'' over all tables per each page request.
 +
 
 +
The above critique is not meant to say that [[SQL]] is useless. Even the [[bugzilla]] has reporting pages that can query and sort bugs by their status. For such tasks [[SQL]] is more than suitable. So [[SQL]] will not go away.
 +
 
 +
It is just time to get back to the roots. Realize that storing primary data in effective data structures can speed up [[performance]] of the most common operations your system does. Use [[SQL]] as tool to allow complex (but often rare) queries based on data in derived databases just like Sybase's original plan was in the seventies.
 +
 
 +
== External Links ==
 +
 
 +
* [http://www.slideshare.net/kevinweil/nosql-at-twitter-nosql-eu-2010 No SQL at Twitter]
 +
 
 +
<comments/>
 +
 
 +
== [[Talk:SQL|Comments Written so Far]] ==
 +
 
 +
{{:Talk:SQL}}

Current revision

SQL is a DSL used to manipulate and especially query relational databases. The mathematical roots of SQL are based on relational algebra. SQL seems to be one of the most often used programming languages in software engineering, althrough these days it may be slightly over its zenith.

Contents

History

The SQL was on rise in the seventies and eighties driven by the (for example Sybase) vision to unify all data available and represent them in one big database. These data were collected from various other (non-relational) databases which many businesses used to have at that time (and very likely still continue to maintain even now). Over the night a snapshot of such data was converted into various SQL tables. Next morning, the management could then effectively perform various queries and found out more precisely than ever what is the state of the business.

In this scenario the SQL database is not the primary source of the data. It is more or less reporting tool. There is nothing wrong on that! Compared to previous age where one had to ask a programmer to write code to traverse various graphs and nets of data to find out an answer to a question. Then one had to wait for a few weeks until the question is coded and executed to get answer when it did not matter at all. With SQL queries everything is much more simplified. All the data are available to a simple SELECT. Coding of a query does not require orchestration of a various languages and systems. Cluelessness par excellence! Even non-programmers can perform or modify queries and they can see the results almost instantly (sometimes), much faster than in the previous era.

SQL is well suited for this reporting job. I can imagine companies using it becoming much more effective. The increase of effectiveness was so visible, that SQL become synonymy for a database. Since then people started to use SQL databases more and more. So finally the SQL database become not just a view over primary data, but the data itself.

All the data are stored in SQL, instantly available, ready for various selects. One does not have to wait 24 hours for a report, one can get the report immediately. Heaven isn't it? Or did we trade this flexibility for something essential?

Internet Age

Over the (successful) years many good practices for empowering SQL in storing data has been invented. We know how to normalize a database so it is suitable for general purpose querying. However, read the previous sentence once again. It says general purpose querying. Do you think that in the internet age we need general purpose querying?

Let's look at example of Bugzilla. Bugzilla is a bug tracking system backed by an SQL database. Does it help if the database is normalized? No, not in the most common case. The majority of pages displayed by Bugzilla just show status of a single bug. Such status page includes information about the bug itself, about the list of people interested in it, about a reporter, it shows list of attachments and comments associated with the issue. Do you know what this means in terms of SQL?

SELECT few_lines FROM ISSUES, PEOPLE, COMMENTS, ATTACHEMENTS;

Basically the whole universe of data spread over various tables (because it is good habit to have normalized database) is joined into a giant table. The select then filters tiny piece of such data related to the desired issue.

This is complete bulldozer approach - a form of cluelessness. Yes, we can use nice SQL language for doing relation algebra, but the performance completely sucks.

If the details of a single issue were stored in a single file (possibly under a version control system to keep history), it would be enough to read such one file to render the HTML for the whole issue page.

Somehow over the course of time we have forgotten that SQL is good for generating reports and started to use it for storing any kind of data. As far as I can tell, those who design heavily loaded websites have already realized that and are seeking for alternatives (like the trivial one file per bug, or Lucene, etc.). It is clear that effective web sites need something else than SELECT over all tables per each page request.

The above critique is not meant to say that SQL is useless. Even the bugzilla has reporting pages that can query and sort bugs by their status. For such tasks SQL is more than suitable. So SQL will not go away.

It is just time to get back to the roots. Realize that storing primary data in effective data structures can speed up performance of the most common operations your system does. Use SQL as tool to allow complex (but often rare) queries based on data in derived databases just like Sybase's original plan was in the seventies.

External Links

<comments/>

Comments Written so Far

MDX

I'd go with the idea that MDX are far superior for actual data mining and data-model construction. SQL does not natively represent complex data structures well. That said, SQL will probably remain around for quite some time -- it is easy, known, and fast.

. said ...

my friend - sql is an abomination so is html the standards are formulated so it's easy to learn and train mass mart engineers that are clueless most of the times anyway; only a hand full of people realize these problems: you're one of them.

Spatial databases has been around for ages in fact much longer than relational but harder to grasp hence the majority of sheep flock the other way.

Here it sounds like bugzilla simply made a poor or short sighted architectural/design decision - you cannot blame them for this they used what they know is best at the time. Lots of relational database today have wrappers for MDX allowing you to query a relational database in that form. Some even have special tools that cache index information on spatial form.

A mature or long running system normally indicates these problems you speak of.

No use crapping about it; hence the opportunity presents itself to introduce similar wrappers for whatever database bugzilla runs on, maybe MySQL.

I'd think hibernate to be a worthy candidate if they don't have it already.... !!!

--. 07:24, 1 September 2010 (CEST)

I can't imagine use of hibernate on any other than relational (aka SQL) database.

--JaroslavTulach 17:02, 1 September 2010 (UTC)

SMJ said ...

No valid point was made! This article is complete nonsense and grammatically deficient at that. Try some number agreement on your statements! The author shows how little he or she knows about the subject by creating new terms at will. SQL is neither modeling nor a database: it is a query language. It is right in the name Q is for query! Just like in OQL (object query language) Should your query performance be poor, ask for assistance from an expert: you are doing plenty wrong. 30 years of industry experience getting it right says this author is bogus, or at least too inexperienced to speak on this topic.


--SMJ 07:51, 1 September 2010 (CEST)

Dan Sheppard said ...

I feel the need to respond to SMJ's.

The author is not some novice. He wrote an application that is downloaded some 8 million times a year by fellow developers! His experience and ability is unquestionable!

Complete nonsense? No! SQL is a language and the author says so. SQL is however constructed around assumptions about how the data is stored (tables of data) and manipulated (set relation semantics). This provides a general tool and yes performance can be improved in a variety of ways. There is however no getting away from the fact than this arrangement of data is not always what is required and supporting SQL can give an unnecessary overhead. To quote the wikipedia page on NoSQL (which includes reference link to research) "Typical modern relational databases have shown poor performance on data-intensive applications including indexing a large number of documents, serving pages on high-traffic websites and delivering streaming media."... and so Google and Amazon store some of their data in non-relational databases with no SQL.

As one other example it should be noted that chess games are held in bespoke non-SQL databases by the various chess programs. Chess databases do not store games as tables of data but as trees so that games can be efficiently navigated through and queried. Queries are made against the characteristics of the position and not just against a set of data.

Grammatically deficient? I'm sure your Czech is much worse than his English! I've certainly seen worse grammar than in this article. SMJ's wasn't perfect either: "It is right in the name Q is for query!" is missing punctuation between the 2 clauses!

--Dan Sheppard 15:32, 1 September 2010 (CEST)

Thanks guys for your comments. At University we started to learn relational algebra and only then mapped its concepts into SQL. As there is almost 1:1 mapping between SQL and the relational algebra (well, there is no LIKE construct and sorting of results is probably impossible), as relational algebra is applicable only to relational databases, it all forms one big braid for me. Thus I call it all SQL. SQL databases, SQL DSL, etc.

Btw. I was in rush to publish the SQL blog before Sep 1, 2010 and as such I did not re-read it properly. I am not saying the grammar would be perfect if I did, but there might have been slightly less obvious typos.

--JaroslavTulach 17:02, 1 September 2010 (UTC)

Jiri Tulach said ...

It's all about data size and performance. There are many options. You can infuence it by DB architecture, selection of SQL machine, caching techniques, HW sizing and replication. NoSQL is also option but it's just a different data structure and you can also stuck in the similar troubles as in SQL. Also NoSQL is often used because many SQL machines are not easily scalable.

--Jiri Tulach 23:19, 2 September 2010 (CEST)

Čau Jirko. As I am Oracle employee since 1st Sep, 2010, and I was well educated during recent days' presentations I know that Oracle SQL database is perfectly horizontally scalable (compared to any competition). So you are right, it is a question of a good database. So guys, don't wait and buy one from my employer!

Encode everything into a columns and rows. Perform an SQL query which runs on indexes like crazy. Buy good database to make it effective. Be clueless! Sure, you could consider storing data in plain files are read each in less then 10ms and index this by Lucene. This solution could fit in 64MB machine (including Java runtime). But where would be the reference integrity? Where would be transaction isolation?

--JaroslavTulach 19:37, 3 September 2010 (UTC)

As a Oracle employee you should know what is the best solution ;-).

--Jiri Tulach 12:45, 6 September 2010 (CEST)

softwarevisualization said ...

I would say to anyone having doubts about eh relational model, not to say any particular SQL product, that they look at Chris Date's work, including his innumerable interviews and articles in which he takes on any and all complaints regarding the relational model.

Honestly, I've read a lot of his back and forth with a wide variety of opponents and I haven't ever judged him to be the loser in an argument.

Of all the books he's written over the years, The book he would tell you to read at this point in time (2011) is probably his Intro. to DB systems : (and at sig)

http://www.amazon.com/Introduction-Database-Systems-8th/dp/0321197844/ref=dp_ob_title_bk


I have no particular love for SQL, it's one of those technologies someone else can do better, and I let them. But the relational model which underlies all SQL products is not something whose limitations were are suddenly discovering- quite the opposite, it's the solution to those limitations.

As Date likes to point out, hierarchical models of data representation, even if that data is itself hierarchical, are a path we've been down before and the relational model is the solution to the intractable problems we encountered there. As Sanatayan said, those who dont' know history are doomed to repeat it and repeating history is what we're doing when we embrace things like OODBMS etc. etc. as models of representation.

But Date explains this so much better. If I thought I had a legitimate gripe against the relational model, not to say a database product, I would first check to see what Date had to say lest I leave myself open to eventually being exposed as someone who really does't know have a firm grasp on the subject :(

--softwarevisualization 03:20, 24 February 2011 (CET)

I don't feel real need to read book about SQL now (I still have three Dostojevkij's books to go through), but I found: http://broadcast.oreilly.com/2009/07/relational-databases-as-realit.html The article seems to indicate that Date explains SQL from a point of its relational algebra roots. I shall be well aware these roots, my profesor Jaroslav Pokorný started to explain the algebra to us and only after few months (when we knew concepts of the relational algebra by heart) he mentioned how to realize the same concepts using SQL. I have forgotten most of it already, but I understand the value of the mathematical model behind relational database. All I am saying is that often we pay too big price for the comfort of few joins and selects and that simple solution may be more effective. Of course, if you really value your data, Oracle's database is the best place to store them ;-)

--JaroslavTulach 12:35, 26 February 2011 (UTC)

Personal tools
buy