Nigel Thomas

Subscribe to Nigel Thomas feed
Preferisco is Italian for "I prefer" - things like Oracle development and performance, the EAI / ETL scene, and of course our farmhouse in central Italy.Nigel
Updated: 14 hours 57 min ago

Thirty years on, 10gen emulates Oracle

Wed, 2013-04-03 11:38
It's now more than thirty years since I first came across the Oracle database. At that time, Oracle had only just got a distributor in the UK (a small part of CACI, with just three staff: Geoff Squire, Mike Evans and Chris Ellis - soon to be joined by Ian Thacker). We selected Oracle for an MOD project and the rest is history.

SQL and relational database were a completely new thing then (actually, any databases were a pretty new thing - I'd got through an entire Computer Science degree without ever coming across even a sniff of a database). During the mid '80s, Oracle UK did a fantastic job of proselytising relational as a concept, and SQL as the database lingua franca. They ran regular "no obligation" seminars that were accessible to both techies and their managers; these were held frequently in London, and from time to time around the rest of the UK (and through Europe as things took off). I don't know whether the same was true in the USA, but I expect it was. The seminars generated a lot of interest, and I'm sure the campaign is one of the principal reasons Oracle was able to steal the top dog position it has held ever since in the RDBMS world.

Well, here we are and 10gen (G+: +MongoDB, - developers of MongoDB - seem to have learned the lesson. They are doing a heck of a lot to spread the word, including (since last October) running free introductory online courses. I've just completed the sixth and last full week of M101J - MongoDB for Java Developers (week 7 is the final exam). More on the detail in another post - but the main point is that this relatively small amount of investment in education should generate a lot of demand (not to mention good feeling). If you're interested, the courses are being run on a cycle - next one to start is M102 - MongoDB for DBAs, on April 29 - see for more details.

Even if you don't want to be drinking the Kool-Aid, working through the course is a great way of beginning to understand the strengths and weaknesses of MongoDB in particular, or NoSQL / document oriented databases in general.

Managing Windows scheduled tasks - SCHTASKS output misleading

Thu, 2012-01-26 07:04
Here's a little gem - found on Windows Server SP2 but still there on Windows Server 2008 R2 SP1 at least.

I wanted to write a little script to disable some scheduled tasks (for maintenance) then after a predetermined time to re0-enable them. This is a common support problem, and I find I often complete the maintenance and forget to re-enable the tasks which results in alarms going off - but maybe not until the start of the next working day.

Anyhow, Windows gives you (at least) two ways of interacting with scheduled tasks:

  2. PowerShell and the PowerShellPack which includes a TaskScheduler module
Although Powershell is an attractive option for scripting, PowerShellPack is poorly documented and the TaskScheduler module is a bit lacking. You can create, start, stop, register or get a task but there doesn't seem to be a cmdlet for actually enabling or disabling a task.

So, back to using groovy as a wrapper to SCHTASKS. All fine, we can use execute to create a CMD process that calls SCHTASKS /query to get the task status. Here's an example using easy-to-parse CSV format:

C:\>schtasks /query /fo csv /tn "\Apple\AppleSoftwareUpdate"
"TaskName","Next Run Time","Status"
"\Apple\AppleSoftwareUpdate","31/01/2012 11:15:00","Ready"

We can see that "Status" is in field 3 on the heading line, and its value is "Ready" on the data line. That's great.
To disable the task, we can then:

C:\>schtasks /change /disable  /tn "\Apple\AppleSoftwareUpdate"
SUCCESS: The parameters of scheduled task "\Apple\AppleSoftwareUpdate" have been changed.

Now let's check the status again:

C:\>schtasks /query /fo csv /tn "\Apple\AppleSoftwareUpdate"

"TaskName","Next Run Time","Status"

Yay! The task is indeed disabled - but look how the status has swapped into field 2 - under "Next Run Time". Presumably because there is no next run time while the task is disabled. A blank 3rd field value has been provided, but it is in the wrong place. Whatever way you list out the data, the error is still there:

C:\>schtasks /query /fo table /tn "\Apple\AppleSoftwareUpdate"

Folder: \Apple
TaskName Next Run Time Status
======================================== ====================== ===============
AppleSoftwareUpdate Disabled

C:\>schtasks /query /fo list /tn "\Apple\AppleSoftwareUpdate"

Folder: \Apple
HostName: PIERO
TaskName: \Apple\AppleSoftwareUpdate
Next Run Time: Disabled
Logon Mode: Interactive/Background

OK, now I know this, I can work around it. But another example of MS inconsistency (which no doubt is now firmly baked in for "backward compatibility" for ever and a day...

Breaking change in calling Groovy on 1.8 upgrade

Tue, 2011-11-29 10:07
I've been bitten by this a couple of times now, so for anyone else's benefit: If you have a bat file that calls a groovy program, you may notice surprising behaviour after an upgrade from 1.7.x to 1.8.x (I went from 1.7.4 to 1.8.4).

If your bat file looks something like:
..some stuff..

groovy myGroovy
copy xyz abc

... more stuff ..
Then in 1.7.4 you would have called groovy.exe, executed the program, then continued to copying the file. But in 1.8.x groovy.exe is deprecated so instead you execute groovy.bat. Unfortunately, when a Windows bat script calls another in that way, it effectively jumps to the script (with no return) so the script finishes at the end of groovy.bat. To fix this, use the Windows CALL instruction:
..some stuff..

call groovy myGroovy
copy xyz abc

... more stuff ..
With the CALL, the groovy.bat script executes and then returns control to your script, and the copy and more stuff actually happens.

NOTE: I think the reason I have the problem is that I installed the generic groovy rather than using the specific windows installer (eg here). But codehaus seems to be down right now.

MySQL Group By is a little too indulgent

Thu, 2011-07-28 07:54
After 30 years of Oracle, I've found myself using MySQL recently. I came across a little thing that surprised me. I'm by no means the first to trip over this - I found this 2006 post from Peter Zaitsey on the same topic.

MySQL lets you write a group by statement that references columns that aren't in the group by, and aren't aggregates. For example:

mysql> select table_name, column_name, count(*)
-> from information_schema.columns
-> where table_schema = 'information_schema'
-> group by table_name
-> limit 5;
| table_name | column_name | count(*) |
5 rows in set (0.07 sec)

A similar query from any version of Oracle would fail:

SQL> select table_name, column_name, count(*)
2 from dba_tab_columns
3 group by table_name;
select table_name, column_name, count(*)
ERROR at line 1:
ORA-00979: not a GROUP BY expression

In effect MYSQL is doing the GROUP BY as requested, and giving you the first value it comes across for the un-aggregated columns (COLUMN_NAME in this example). A near equivalent Oracle query would be:

SQL> select table_name, min(column_name), count(*)
2 from dba_tab_columns
3* group by table_name

------------------------------ ------------------------------ ----------
ICOL$ BO# 14

But in the Oracle case we are explicitly selecting the MIN(column_name), whereas MySQL's laxer behaviour is just picking the first column name at random (or rather, dependent on the execution plan).

So: when grouping in MySQL, make double certain that your SQL is really returning the number of rows you expected. In our example it is possible that the intention was actually the very different:

mysql> select table_name, column_name, count(*)
-> from information_schema.columns
-> where table_schema = 'information_schema'
-> group by table_name, column_name
-> limit 20;
| table_name | column_name | count(*) |
20 rows in set (0.06 sec)

Happy debugging everyone!

OT: Good value first class fare?

Fri, 2010-11-26 05:43
From the NationalRail website:

One million squids for a day return?

Using Groovy AntBuilder to zip / unzip files

Mon, 2010-06-14 06:23
I've been quiet for quite a while - partly because I am not working with Oracle just at the moment. I have been building some automated workflow systems using Groovy as the scripting language. I've known about Groovy since James Strachan first invented it back in around 2002/3 - but this is the first time I've really been using it in earnest. It's a great for portable scripts, and for integration with Java (it runs in the JVM). It's much friendlier than Java for someone like me who comes from a PL/SQL and C (not C++) background.

Anyhow, I found out about using Groovy Antbuilder tasks, and have been using them to manage zipping / unzipping file sets:

def ant = new AntBuilder(); // create an antbuilder
ant.unzip( src: planZipFile, dest:workingDirName, overwrite:"true")

Then I found I wanted to flatten the output (ie don't reproduce the directory structure). The Apache Ant documentation for the unzip task shows the Ant XML:

<unzip src="" dest="${tools.home}">
<include name="apache-ant/lib/ant.jar"/>
<mapper type="flatten"/>

How to add the mapper element?

Well, lots of googling later, I couldn't find an example but I did see the patternset being used. So thanks to that, I found that the Groovy way of expressing the mapper part of this is to add a closure after the call:
def ant = new AntBuilder();
ant.unzip( src: planZipFile, dest:workingDirName, overwrite:"true"){ mapper(type:"flatten")};

So I hope someone finds that useful.

Customer satisfaction - the Xerox Effect

Fri, 2010-02-12 04:42
Thanks to Martin Widlake for pointing to this gem of a paper from Dennis Adams (pdf), pointing out that an increase in customer satisfaction can lead to an increase in negative feedback, and vice versa. Anyone who has worked in customer support (whether on an internal help desk or for external customers) will have gone through a "why don't they love us, we're doing such a great job for them?" period. This might explain why.

User friendly / supported monitoring of concurrent processes

Tue, 2009-10-13 10:59
Yes, I know everyone else is having a great time at OOW, but some of us are back in the real world still.

I've asked a question on OTN (under EBS General Discussion) Best way to execute / monitor long running custom conc request with slave.

Can anyone help me with suggestions for an EBS-supported API (11.5.10 on Solaris 10 / Oracle 9iR2) that would enable the professional user who launched a (PL/SQL) concurrent process to monitor its progress over several hours from his/her application UI? To add to the fun, the process is going to spawn some slaves to make use of all the spare CPUs / cores / threads we have lying around.

As a developer, I would normally start with the DBMS_APPLICATION_INFO.SET_SESSION_LONGOPS procedure (and I'll build that in anyhow) - but in this case I'm struggling to find any documentation or ML notes. to point me at something that would actually appear on the apps UI.

Answers here - or better still, on the OTN thread. Thanks in advance!

ETL patent case: Constellar and DataMirror let off off the hook; DataStage still in dock

Mon, 2009-07-06 14:27
Once again Vincent McBurney delivers a fantastic summary of the latest state of the Juxtacomm ETL patent case: SQL Server, DB2 and DataStage will fight out Data Integration Patent Infringement.

I'm most interested from the Constellar point of view - I first came across Constellar (then Information Junction) as a product on sale in late 1993 / early 1994 (before joining the company from Oracle in 1995), so it always seemed clear to me that it would qualify as prior art to Juxtacomm's 1998 patent. Oddly, it seems that the parties to the trial have agreed that Constellar Hub (and DataMirror Transformation Server) can be dropped from consideration; they won't be subject to damages - but equally they won't be considered as prior art. I don't understand that, but I guess the IBM lawyers must know what they are doing.

So, the case rumbles on, serving (if nothing else) to show how broken the US software patent system is.

SQLstream delivers instant data stream analysis of Mozilla 3.5 downloads

Wed, 2009-07-01 13:52
Here are a couple of posts that describe the download monitor/dashboard which is giving up-to-the-second statistics for downloads by country of the latest Mozilla release 3.5 (just about to top 5.5 million downloads since yesterday's launch). The dashboard has been put together with the help of my friends at SQLstream. Just don't try looking at this with Internet Explorer, as it doesn't support HTML5.

Julian Hyde on Open Source OLAP. And stuff.: SQLstream powers ...
By Julian Hyde
SQLstream gathers data from Mozilla's download centers around the world, assigns each record a latitude and longitude, and summarizes the information in a continuously executing SQL query. Data is read with sub-second latencies, ...
Julian Hyde on Open Source OLAP.... -
SQLstream the Sequel - RealTime Intelligence for Mozilla BI in Action
From ebizQ Presents BI in Action Virtual Conference ...

Be Alert!

Mon, 2009-06-15 16:00
Here's a tale of woe from an organisation I know - anonymised to protect the guilty.

A couple of weeks after a major hardware and operating system upgrade, there was a major foul-up during a weekend batch process. What went wrong? What got missed in the (quite extensive) testing?

The symptom was that batch jobs run under concurrent manager were running late. Very late. In fact, they hadn't run. The external scheduling software had attempted to launch them, but failed. Worse than that, there had been no alerting over the weekend. Operators should have been notified of the failure of critical processes by an Enterprise Management (EM) tool.

Cut to the explanation:

As part of the O/S upgrade, user accounts on the server are now set to be locked out if three or more failed attempts to login are made. Someone in operations-land triggered a lockout on a unix account used to run the concurrent manager. And he didn't report it to anyone to reset it. So that explained the concurrent manager failures.

The EM software that should have woken up the operators also failed. Surprise, surprise: it was using the same (locked-out) unix account.

And finally, the alerting rules recognised all kinds of warnings and errors, but noone had considered the possibility that the EM system itself would fail.

Well, it's only a business system; though a couple of C-level execs got their end of month reports a couple of days late, and there were plenty of red faces, nobody actually died...

Just keep an eye out for those nasty corner cases!

Oracle Exadata posts #1 TCP-H result

Sun, 2009-06-07 07:01
Grag Rahn's Structured Data blog provides the data that Kevin Closson had to remove from his own blog. From an HP/Oracle point of view, a very good performance, reducing cost/QphH by a factor of 4.

However, it is interesting to see that the HP/Oracle solution is still more than 4 times the cost/QphH of the #2 placed Exasol solution (running on Fujitsu Primergy, and reported a year ago) - while the absolute performance improvement is relatively slight (1.16M queries/hr against 1.02M).

Flutter is the new Twitter

Tue, 2009-04-21 16:23
Worth a look, for those (like me) who find 140 characters too much to hande. Hat-tip to the BCS Oddit blog

Doubly dynamic SQL

Sun, 2009-02-22 06:36
It is great to see a new post from Oracle WTF last week, after a quiet period. Which reminded me to post this example of a dynamic search.

I won't post the whole thing, and I have disguised the column names to protect the guilty. The basic problem is that the developer didn't quite understand that if you are going to generate a dynamic query, you don't have to include all the possibilities into the final SQL.

Let's say the example is based on books published in a given year. First, to decide whether to do a LIKE or an equality, he did this:

' WHERE' ||
' (('||p_exact||' = 1 AND pdc.title = '||chr(39)||p_title||chr(39)||')' ||
' OR ('||p_exact||' = 0 AND pdc.title LIKE '||chr(39)||l_title||'%'||chr(39)||')) ' ||
' AND ('||p_year||' = 0 OR pdc.year = '||p_year||')' ||

So at runtime you get both predicates coming through. Suppose you wanted an exact search (p_exact=1), for p_title='GOLDFINGER'. We don't know the year of publication so we supply 0. The generated predicates are:

WHERE ((1 = 1 AND pdc.title = 'GOLDFINGER')
OR (1 = 0 AND pdc.title LIKE 'GOLDFINGER%'))
AND (0 = 0 or pdc.year = 0)

Wouldn't the logically equivalent:

WHERE (pdc.title = 'GOLDFINGER')

have been much easier? Add a few of these together and a nice indexed query plan soon descends into a pile of full table scans and humungous hash joins. Oh, and no use of bind variables, so in a busy OLTP application this could sabotage the SQL cache quite quickly.

My favourite part though is with the sort order. The user can choose to order by a number of different columns, either ascending or descending:

l_order := ' ORDER BY ' ||
' case '||chr(39)||p_sort||chr(39)||' when ''publisher asc'' then publisher end asc, ' ||
' case '||chr(39)||p_sort||chr(39)||' when ''publisher desc'' then publisher end desc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''book_type asc'' then book_type end asc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''book_type desc'' then book_type end desc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''title asc'' then title end asc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''title desc'' then title end desc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''year asc'' then year end asc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''year desc'' then year end desc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''book_id asc'' then book_id end asc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''book_id desc'' then book_id end desc';

Yes, you got it; given that the variable p_sort has been picked from an LOV, the whole piece of PL/SQL can be replaced by:

l_order := ' ORDER BY ' ||p_sort;

That looks better, doesn't it?

Tibco RV in a box - would appliances help streaming SQL?

Thu, 2009-02-05 04:23
Several others have posted on the new Tibco Messaging Appliance - apparently it's Tibco Rendezvous (RV) in a box OEM'd from Solace Systems. Paul Vincent ponders at the Tibco CEP blog:
    It’s quite feasible that the same approach could be used for “basic” complex event processing operations, especially those that don’t require history (or much persistence)
Paul is certainly right that CEP and event/data stream processing engines could benefit from the appliance approach. However I think that it is precisely those applications that require history that could most benefit. A high-speed streaming engine with a co-located database (running with much of the data in cache) gives the perfect environment for answering a wider range of "right-time" BI requirements; the engine can control exactly what needs to be retained and for how long to provide the appropriate time-series data to support whatever aggregated outputs are desired. The database acts as a memory extender (allowing infrequently accessed data to be paged out), as a source of replay / recovery data, and as a history for newly defined streaming queries. The latter is most important - if you want a 1 hour moving average, most streaming queries won't give you the right answer until the end of the first hour.

Analytics as a Service

Mon, 2009-02-02 04:23
With all this talk of SQLstream's recent v2.0 launch, I was interested to read Tim Bass's CEP blog posting on Analytics-as-a-Service. He calls it A3S - and rightly avoids calling it AaaS; apart from the fact the X-as-a-Service is almost as cliched as XYZ 2.0 (and equally meaningless), careless use of that sequence of As and Ss could cause spam filters round the world to get over-excited.

If we must have the as-a-service tag, I'd like to trademark BI-as-a-service: BIAS. Apart from being a proper acronym, it also gets across that BI often gives you the answers you think you want - not necessarily the ones you really need.

Shared Feeds

Sun, 2009-02-01 10:07
My posting rate has been quite low recently, but I have been enjoying using Google Reader to keep up with what everyone else is saying.

I've been sharing items with some friends / colleagues, but it seems harmless to open this up to the world. The topics are eclectic, but mainly around product development, event stream processing, RDBMS and data warehouse. Maybe the very occasional Dilbert or xkcd. You're all very welcome to see what I've shared - the links are now on the right hand side of this blog. And if you too are using a reader, here is a direct link to my shared items and here is the atom feed for them. If Google could just pack up my daily shares as a single blog post, wouldn't that be great.

I'm still successfully avoiding Twitter... I've no idea how I would find the time, and having seen Philip Schofield mentioned as a user (on a low rent BBC Sunday afternoon show) - and apparently now he is London's #5 tw*t(terer), so it must be hopelessly uncool anyway.

SQLstream launches v2.0 of its Event Stream Processing engine

Sun, 2009-02-01 08:55
As well as working as a freelance Oracle consultant, I have spent most of the last year working with SQLstream Inc on their Event Stream Processing engine, version 2.0 of which has now been launched.

My initial point of contact was Julian Hyde, with whom I worked in the Oracle CASE / Oracle Designer team in the early 90s. He went out to Oracle HQ and worked on bit-mapped indexes, then spent time at Broadbase before becoming best known as the founder-architect for the Mondrian OLAP server.

Event Stream Processing (ESP) is a development of what we might have called Active Database a few years ago. Rather than running queries against stored data as in an RDBMS, we register active queries which can be used to filter data arriving from any kind of stream. These active queries behave just like topic subscriptions in a message bus - except that unlike typical JMS implementations, the SQLstream query engine can do more than just filter. SQL queries against the data-in-flight can also:
  • join data between multiple streams, or between streams and tables
  • aggregate data within a stream to give rolling or periodic aggregates based on time or row counts
  • apply built-in and user-defined functions to columns within the stream's rows
  • build a network of streams and views to support complex transformation and routing requirements
Queries are defined using SQL:2003 with minimal extensions (there are some SQL examples on the web site); so developers used to working with an RDBMS will find it easy to grok the relational messaging approach. Unlike a typical message bus, most transformation and routing takes place entirely inside the SQLstream environment, rather than being delegated to external applications.

My experience working at JMS vendor SpiritSoft convinced me of the value of asynchronous message-based techniques, which in the last 10 years have spread out from high-end financial systems to ESB implementations all over the place. Since the early 80s we have seen how the RDBMS swept all other forms of structured data storage away. Now SQLstream's relational messaging approach removes the impedance mismatch between how we store data, and how we process it on the wire. In principle, this architecture can subsume both message-oriented (ESB style) and data-oriented (ETL style) integration architectures.

It should be said that "other ESP engines are available". Oracle itself has two projects on the go: its own CQL which I believe is still internal, and Oracle CEP (the rebranded BEA WebLogic Event Server - which itself is (or was) based on the open source Esper project). These two development threads will no doubt combine at some point (perhaps they already have?). IBM also has two or three independent CEP (complex event processing) projects on the go.

I think the same thing will happen to ESP / CEP as happened to ETL/EAI tools in the last ten or fifteen years. For sure, the database/application server vendors (especially Oracle and IBM) will sell plenty of this software within their respective client bases. An Oracle CEP system that was (or could be) tightly integrated with the RDBMS itself - maybe executing PL/SQL as well as Java functions - would be an easy sell. However multi-vendor sites will be interested in an agnostic / vendor-independent tool as a basis for their integration efforts. Just as Informatica has carved out a place for itself in competition with Oracle's ODI and OWB and IBM's DataStage, so SQLstream and other ESP vendors can fight for the common ground. It will be very interesting to see how it all turns out.

See the SQLstream product page for background, plus posts from Julian Hyde, CTO of SQLstream and Nicholas Goodman, Director of BI Solutions at Pentaho.

PS: here's another post from David Raab.

Unexpectedly honest job posting

Fri, 2008-11-28 12:58
I recently joined the Oracle Connections group on Linked-In and I'm getting regular daily mails with job postings and searches for work. Mostly harmless, but I've just seen a great one:

(You may need to be a member of Linked-In and/or the Oracle Connections group to follow the link).

I think we've all been there, confRigurating away to our heart's content, haven't we? It certainly explains a lot of the problems we see in production.

OT: 3 Mobile Broadband doesn't much like Gmail and Blogger

Sun, 2008-11-02 08:36
I use 3 mobile broadband in the UK (and Italy) and I have only a couple of nags about it:
  1. I can't use FireFox 3 and Gmail together over mobile broadband - I have to switch to IE7 and sometimes I have to downgrade Gmail to the simple HTML version. I have 2 gmail accounts (one work, one private) and the problem seems to be worse on the latter. The symptom is that the loading bar is followed by a blank screen and the status "Done".
  2. I simply cannot seem login to over mobile broadband - hence no posts during the week (probably a good thing as I'm supposed to be working). I get a 404.

These problems seem to persist whether I am in Italy with no bars on my reception, or in the west end with 5 bars. Does anyone have any idea what's going on?