How I Learned To Stop Worrying And ... er ... Accept Caché ObjectScript: julho 2008

terça-feira, 29 de julho de 2008

You know what I really wish?

I wish Caché had decent regular expressions. The same ones that Perl made ubiquitous and are supported in C++, Java, Python, Ruby, PHP, Javascript and every other language you can think of.

It's hard to write a wiki without good string processing and regexes.

domingo, 27 de julho de 2008

Intersystems's documentation for Caché is notoriously bad.

Here's a classic example I found today for the "super" keyword.

Yes, I realize "super" has something to do with super-classes and inheritance and so the "extends" keyword is relevant ... but ... er ... wouldn't you expect the "super" example to actually *use* the "super" keyword? Somewhere?

Or if there is no "super" keyword, because "extends" is how you declare inheritance relationships, what on earth is this page doing here?

Update : after further consideration, the only way I can decode this is that "super" is the name of the list of super-classes. That's why the super "keyword" can "have a value".

This is typical of Intersystems' often garbled approach to documentation. Don't they know what the word "keyword" actually means???

quarta-feira, 16 de julho de 2008

Cool. Someone at George James (who make Caché source-control software) has noticed this blog. :-)

When I first started using Caché I went looking on the web and found ... well, not what I was looking for.

Where are all the Caché blogs?

I thought.

Coming from a world of Python and Erlang and other scripting languages, which are also usually associated with web-applications and often the free-software / open-source movement, I'm used to a very loud bustling chatter as people ask for and offer help, create new applications, celebrate and criticise language features etc.

Arriving in the Caché ecosystem was, frankly, a bit weird. Like arriving in a ghost-town. Where are all the people?

George James was one of the few (well, actually, now I think of it) the only specifically Caché blog I found.

So where is everybody? There must be more of us out there? If you have a Caché related blog or wiki or site, drop me a comment and I'll be delighted to add it to my blogroll on the right.

Update : and of course, if someone from the community wants to point out that I've got it all completely wrong about iterators etc. then that's welcome too.

The Quest for an ObjectScript Iterator (part 5)

In which we take the last post's iterator and simply wrap it inside a Caché Objects class.

Define a class User.MultiIterator



Class User.MultiIterator Extends %Persistent
{

Property i As %String;
Property j As %String;
Property k As %String;
Property val As %String;

Method startIterator()
{
  set ..i=""
  set ..j=""
  set ..k=""
}

Method next()
{
  if (..i="") {
    set ..i=$order(^multi(..i))
    quit ..next()
  }
  if (..j="") {
    set ..j=$order(^multi(..i,..j))
    quit ..next()
  }
  set ..k=$order(^multi(..i,..j,..k))
  if ..k'="" {
    set ..val = $get(^multi(..i,..j,..k))
    quit 1
  }
  set ..j=$order(^multi(..i,..j))
  if ..j'="" {
    quit ..next()
  }
  set ..i=$order(^multi(..i))
  if ..i'="" {
    quit ..next()
  }
  quit 0
}

}

Using it (in another file) is now :



new it
set it= ##class(User.MultiIterator).%New()
do it.startIterator()
while it.next() {
  write !,it.i_","_it.j_","_it.k_" : "_it.val
}

The logic is identical. The only difference, now I've made the i,j and k indices and the val properties of the iterator object rather than local variables passed by reference. It looks perhaps a little cleaner from the user's perspective. OTOH you have to create an extra class definition routine.

In the next post I think it's time to start thinking how we can add some filtering to the iterator.

Update : thanks to 80N for the correct. (Read comments)

terça-feira, 15 de julho de 2008

The Quest for an ObjectScript Iterator (part 4)

Here's a three key iterator that runs through a global called ^multi :



StartIterator(&i,&j,&k)
  set i = ""
  set j = ""
  set k = ""
  quit
 
Next(&i,&j,&k,&val)
  if (i="") {
    set i=$order(^multi(i))
    quit $$Next(.i,.j,.k,.val)
  }
  if (j="") {
    set j=$order(^multi(i,j))
    quit $$Next(.i,.j,.k,.val)
  }
  set k=$order(^multi(i,j,k))
  if k'="" {
    set val = $get(^multi(i,j,k))
    quit 1
  }
  set j=$order(^multi(i,j))
  if j'="" {
    quit $$Next(.i,.j,.k,.val)
  }
  set i=$order(^multi(i))
  if i'="" {
    quit $$Next(.i,.j,.k,.val)
  }
  quit 0
 

test() 
  new i,j,k,val
  do StartIterator(.i,.j,.k)
  while $$Next(.i,.j,.k,.val) {
    write !, i_","_j_","_k_" = "_val
  }
  quit

As you can see $$Next() starts getting somewhat complicated. Not only long but with various recursive calls to itself. Why is this?

Well, first, compare what the ordinary nested loop version of this would look like :



order()
  new i,j,k
  set i=""
  for { 
    set i=$order(^multi(i))   
    quit:i=""
    set j=""
    for { 
      set j=$order(^multi(i,j))  
      quit:j=""
      set k=""
      for {
        set k=$order(^multi(i,j,k))
        quit:k=""
        write !,i_","_j_","_k_" = "_^multi(i,j,k)
      }
    }
  }
  quit

For the iterator version, we have to find a way to flatten out that nested structure of $order statements. We also have to cope with the fact that inside our $$Next we don't have any state information except the values of i, j and k. We don't know if j="" because we haven't started looping yet, or because we just reached the end of the js under the current i. (This is something we do have, for free, in the nested loop version.)

And when we do do an $order on i, we have to go round and test the j again ... etc. I'm using recursion to do these tests multiple times because it makes the code shorter.

So why would we prefer the iterator version to the nested loops? Mainly because it decouples the business logic from the details of the data-structure. test() knows nothing of the name or shape of the global. Also, the iterator is reusable many times, but the nested loops will have to be reconstructed whenever we need to run through the global.

The same principles can be applied to create iterators for globals with more keys, although as the number of keys increases, the size and complexity of the $$Next() function also increases. The pattern remains the same though.

segunda-feira, 14 de julho de 2008

The Quest for an ObjectScript Iterator (part 3)

This is the third part of the series on writing "iterators" in Caché ObjectScript that began with an introduction, and went on to describe Caché's database structure. The aim is to invent abstractions that lets us loop through a global without knowing anything about its structure.

This time, I'll keep it short, and just show the simplest COS iterator. Because in these examples, I am using Caché ObjectScript and not Caché Objects I have no objects to encapsulate index state. So I will use call-by-reference to allow a single Next() function to return both the index and the value of each record which are declared in the caller.

For a simple single key global (here called ^xs) the definition of the iterator is this :


StartIterator(&i)
  set i = ""
  quit
 
Next(&i,&val)
  set i=$order(^xs(i))
  if i="" { quit 0 }
  set val=$get(^xs(i))
  quit 1

And we can use it like so :


test()
  new i,val 
  do StartIterator(.i)
  while $$Next(.i,.val) {
    write !, i_" : "_val ; example "do something"
  }
  quit

You'll see that test(), which represents some kind of business logic, has no direct mention of ^xs and no commitment to its structure (except that there is a single index, i which steps through it).

OK, that's the basic idea. It should be reasonably self-evident if you've even started working with Caché. Next episode, I'll delve into the uglier problems of multi-key globals.

See? Told you this would be a quick one. :-)

Method Generators

Just noticed that you can do interesting compile-time code generation in Caché Objects using compile-time Method Generators.

Something I'll be investigating shortly.

domingo, 13 de julho de 2008

The Quest for an ObjectScript Iterator (part 2)

In this series of blog-posts (starting here) I'm exploring the issues of maintaining an abstraction layer between Caché database and your business logic. Caché's tight integration of database and program environment tempts away from this. And, in particular, the functions for accessing and looping through tables pull against it.

In this post, I'll give some background and explanations of how Caché ObjectScript programs see the database. Experienced Caché and MUMPS programmers probably know this already, but new arrivals from other worlds might find it informative.

The Caché database is structured as sparse, multi-dimensional arrays (known as "globals") containing chunks of data in strings. Because the arrays can use strings as indexes (ie. the keys can be strings) meaningful information in a record is usually spread across both the keys and the actual value. But only keys are easy to search on.

This is different from a relational database where all fields are (at least from the SQL writer's perspective) more or less equal.

Let's create an example. A rather simplistic patient record might be stored something like this :


^Patient("general hospital",324542)=john~smith~malaria

where the hospital name is the first key, patient id is the second, and the actual data (first name, last name and disease) is encoded as sub-strings (known as "pieces" in Caché terminology) separated by the ~ character.

Such a database structure makes it easy and very fast, to pull out data if you have all the necessary keys. To get this record from the database into a variable p :


set p = ^Patient("general hospital",324542)

It's also pretty simple to manipulate a subtree. For example there are operations which can copy an entire subtree to another variable.


merge gh = ^Patient("general hospital")

will grab all general hospital patients and put them into a subtree in the variable gh.

You can delete subtrees with


kill ^Patient("old hospital")

etc.

On the other hand, if you want to find all patients who have malaria, you have a slog. You either have to manually run through all the records checking which contain "malaria" in the disease field of the string. Or, if looking-up patients by disease is a common requirement that needs to be fast, you make a second array as a fast, searchable index, that is structured like this.


^PatientDiseaseIndex("malaria","general hospital",324542)

And make sure you keep it in sync.

Iterating through these multi-dimensional arrays is "baroque" to say the least. Caché ObjectScript provides two commands : $order and $query for looping through tables.

$order takes as argument an array descriptor (name and keys), and returns the next key at the same level of hierarchy as the right-most key listed in the array expression.

Huh?

To make that last sentence clearer, here's an example.

Let's suppose we have three patient records :


^Patient("general hospital",324542)=john~smith~malaria
^Patient("general hospital",324549)=martha~jones~measles
^Patient("local clinic",2323)=donna~noble~flu

Calling the $order function like this :


$order(^Patient("general hospital",324542))

will return the value 324549. Why? Because 324549 is the next key at the "patient id" level of the key indexes.

Similarly


$order(^Patient("general hospital"))

will return the string "local clinic", because here we're only giving the top-level key of ^Patient. And the next key after "general hospital" is "local clinic".

Using $order, then, it's possible to loop through each key at a particular level of the hierarchy. It also knows how to find the first key at any level; you simply pass it an empty string. So


$order(^Patient(""))

returns "general hospital", the first top level key. And


$order(^Patient("general hospital",""))

returns 324542, the first second level key below "general hospital".

When the $order runs out of keys at any particular level of a subtree, it returns an empty string.

For example,


$order(^Patient("general hospital",324549))

returns "", which signals to us that there are no patient ids after 324549 in the "general hospital" subtree.

To loop through all records in the table we have to use nested loops. Typically something like this.


set hospital=""
for {
  set hospital=$order(^Patient(hospital))
  quit:hospital=""
  set id=""
  for {
    set id=$order(^Patient(hospital,id))
    quit:id=""
    set p = $get(^Patient(hospital,id))
    ... do something with patient p
  }
}

In other words, it's a bloody performance! Especially when you come from the sort of language where you're used to being able to write something like this :


for p in Patient {
  ... do something with patient p
}

But the problem is far more pernicious than simple verbosity. This code hardwires a great deal of commitment to the particular database structure. Let's suppose we realize at a later date that we really need to add a third key to Patients. For example, our hospital network expands into a neighbouring state and we now need to support a new structure :


^NewPatient(region,hospital,id)

Migrating the existing data is a bit of work. But now every single place in the code that loops through patients looking for records that match some criteria will have to be rewritten as well!

The alternative iterating function $query offers some help. But has its own bizarre qualities.

Like $order, the $query function takes an array name and keys. But it returns a string which contains the full array access expression of the next item regardless of the level of hierarchy. So


$query(^Patient("general hospital",324542))

will return a string containing "^Patient("general hospital",324549)"

This can then be evaled in the next statement. ObjectScript has an @ operator for eval, so we loop through the array like this.


  set q=$query(^Patient)
  for {
    if q '= "" {
      set p = @q
      ... do something with patient p
    }
    set q=$query(@q)
    quit:q=""
  }

As before, there's a way to get at the first record - the $query(^Patient), and when $query returns "" we've reached the end.

This is somewhat of an improvement in that we're back to one loop. And it would still work if we moved to a new structure for ^Patient. It's a minor inconvenience that we've got ourselves into a "for" which only tests for the exit condition at the end of the loop body so we need an extra test that q isn't "" for the actual "do something" part.

The bigger concern is that we've now lost our keys. The value of q is going to be something like "^Patient("general hospital",324549)" while the value of p is "martha~jones~measles". If, in the "do something", I want to know what hospital we're talking about I'm going to have to cut up the string q to extract it. That's a bit painful.

In the next part of this series, I'll start showing some "iterator" routines which do successfully hide the structure of a table and yet give access to necessary key information.

sábado, 12 de julho de 2008

The Quest for an ObjectScript Iterator (part 1)

One of the advantages of Caché is the close integration of programming language and database. It's trivially easy to read and write directly to and from the db in Cache ObjectScript code.

But that ease has a downside : it's so tempting to shunt stuff in and out at a moment's notice, that if not careful, you can suddenly find yourself with an extremely tight inter-dependency between the database structure and your business logic. Hard commitments to the details of your data-base schema are scattered in hundreds of places throughout the code, making any change expensive.

The conscientious programmer will want to put some kind of abstraction layer or API between direct database calls and the rest of the business logic. Such a layer comes in handy if you find you need to add extra logging or transaction protection during record saves, for example.

But the nature of the database makes this a challenge for those used to other languages and ways of doing things. There are particular difficulties when it comes to looping through the tables, as ObjectScript's commands for this make direct reference to the the schema.

In this series of blog-posts I'll describe my own ongoing quest for a decent way to abstract away from direct db access and for some sort of Iterator to let business logic run through collections of records without (much) knowledge of how they're stored.

sábado, 5 de julho de 2008

Here's an interesting example (watch the video) of bringing social-software ideas into the traditional enterprise : ESME on SAP.

Need to start looking into Zen .

On first glance it has some impressive widgets. Especially taking advantage of SVG for cute vector graphs etc.

quinta-feira, 3 de julho de 2008

Just want to drive a spike through a development and code release process I will be following on this blog.

Last weekend I started writing a simple wiki in Cache ObjectScript and CSP, and tonight I polished off a couple of odds and ends.

Wiki is one of the simplest web-apps you can imagine and is often used to demonstrate new web-frameworks. So it was an obvious experiment to try in Caché. I was also surprised when I went googling for Intersystems Caché wiki that I only found Wikipedia entries but no wiki software for the platform.

So I rectified that. Altogether development has taken around three hours. Not too bad if not quite up with the 10 minute examples boasted by Django / Ruby on Rails etc.

I'll have more to say about this code in the next few blog-posts, both to explain how to use it, and some of the implementation decisions behind it. I'll also, I'm sure, be adding further features.

However, that can wait. For the impatient, here's the project on Google Code codenamed "Twistah". The software comes in one XML file (NooRanchWiki.xml), an exported COS "project". Once you download the file You can import it into any namespace using Studio's "Tools > Import Local" menu.

Then go to http://webserver/csp/namespace/view.csp for the HelloWorld page.

At the moment, it doesn't do much wiki markup. Only creating links is supported (for which you must use a [[PageName]] notation (double-square brackets and no spaces allowed at the moment)

It does have an overview of all pages (surprisingly easy) and RecentChanges which has been implemented in a slightly unusual way that I'll explain later.

OK, enjoy what's possibly Intersystem Caché and COS's only open-source wiki package. (BSD license for all you Gnuophobes)

quarta-feira, 2 de julho de 2008

OK, after mentioning some bad things about Caché ObjectScript, here are a couple of good things about Caché in general.

(That's not to say all the suckage is just the COS language, but we can come to other negatives later. This is a "positives" post.)

It's "alive" I'm starting to think that, possibly, the BEST thing about Caché is that it's a living platform in the sense that Steve Yegge talks about here.

What makes it "alive" is that the code is kept in the data-base and is interpreted (or rather, compiled on a routine-by-routine basis) so that it can be changed without the whole stop-recompile-restart cycle for the server. There is a terminal shell through which you can inspect and interact with the database and code. (Create objects, call functions etc.) It's not a great one in the Yegge sense, but it's there and useful.

Caché does not obviously have "advice" as Yegge calls it, but the system I work on does have dozens of hooks, and I think the developers must have got the idea from somewhere. Does Caché have plugins and other ways to extend it? Well, quite a lot of the innards seem to be visible (introspection) and new features like the OO language and the JSP-like CSP-pages ultimately compile down into the core language, so it is extensible.

Keeping the source-code in the database gives rise to one obvious and infuriating problem. You can't use all the ordinary file-based tools that you're used to for things like source-management, difference comparisons, backups and deployment. This is a problem that Caché shares with (comparing the sublime with the ridiculous) Smalltalk. Like Smalltalk, the whole "environment" of code and database is kept in a single large file.

Perhaps the resemblance is not wholly accidental. Caché might, in some ways, be moving to occupy a similar niche to something like Gemstone - a kind of self-contained persistent-object world. And as it does so, it may acquire further similarities to Smalltalk environments. Certainly had Intersystems taken Smalltalk as their model for an OO layer, rather than the stereotypical Visual C++ / Java development environments of the 90s, a great deal of pain might have been ameliorated. (That's something I want to come back to, here I'll just note that it is a good thing about Caché that it's a living platform and with some resemblance to Smalltalk.)

Batteries included : The environment includes, in a single standard installation, the database (obviously), the development tools, runtime environment and web-server.

That's quite handy to set up. It doesn't mean it's easy to deploy the environment, but once the environment is deployed, you have most of what you need for a web front-end and a database backend.

CSP is more or less like every other *SP (JSP, ASP, PHP etc.) You write HTML, can call out to COS on the server (interestingly at both compile-time and run-time, so I guess it's possible to do metaprogramming at compile-time, though need to try this.) and has some special tags.

There's also now a new browser-side component library called Zen, and AJAXy XMLHttpRequest communication behind the scenes. (Another thing I still need to play with.)

It's fast. Allegedly. But that's a plausible claim given how low level the data-access is. Compared to say a multi-layer system with Object-Relation Mapping library over ODBC to relational database.

It's simple. It is, actually. It's pretty simple to make stuff happen. You tend to have direct access to things rather than have to learn sophisticated frameworks, go through multiple abstraction layers etc. There's a downside to that, of course, (in flexibility and maintainability) but the value of simplicity of getting started, and building incrementally from simple prototypes shouldn't be discounted.

terça-feira, 1 de julho de 2008

Here's something interesting : OpenEHR is an open electronic health-record standard.

Seems to be in Eiffel of all things?! Gosh!

Some MUMPS dissing and a more positive respost.

I'm going to blog about what sucks, and what's good (enough) in Caché, although finally my focus will be on how I think it can be used well, rather than judging it.

But let's start with some negatives. From my perspective, after a year or so ObjectScript programming, here are the most disturbing parts :

Dynamic Scope. Yep, this is truly, awesomely fearful. Dynamic scope means that this :



f() 
  new x
  do g()
  write x
  quit

g()
  set x = 5
  quit

do f()

will write the number 5 to the terminal.

Why? Although x is defined within the scope of f, it's visible (and writable) within g. Note, not because g() is within the lexical scope of f() but merely because it is called from it.

The potential for weird interdependencies abounds.

The "new x" at the beginning of f() can shadow (hide any x that was defined previously on the stack). So it's possible to use this mechanism to *avoid* having the effects of dynamic scoping. Unfortunately, legacy code is often FULL of the stuff. Old MUMPS coders seem to be happy to rely on it to pass values into and out of functions.

Dynamic Scoping in large systems is absolutely the number one cause of Caché hell. Especially as it prevents you even *trying* to reduce interdependency. When encountering legacy code with references to unlocalized variables, you might be tempted to add the "new" statement to make them local. However this can break entirely different parts of the code which were relying on the value of the variable being set as a side-effect of the call.

Left to right operator precedence : This is not so evil in principle. There's no reason that it's better for 2+3*6 to be 20 rather than 30. But it sure as hell catches you out if you are experienced in (and plan to stay habituated to) the way most languages do it. It's nastiest in boolean expressions, of course, where


if x = 1 && y = 2 {

is evaluated as


if (((x=1)&&y)=2) {

Significant Whitespace And I don't mean like Python, I mean in the middle or at the end of lines.

For example


kill  for { quit:'$data(x) }

is OK code (bit pointless but syntactically OK, and has understandable semantics).

Whereas


kill for { quit:'$data(x) }

Is a syntax error.

Obviously.

Aaaarrrgghhhh!

OK ... more soon ...

So what do people do with Caché?

A brief glimpse can perhaps be seen in Intersystems' Innovation Awards for 2008

How I Learned To Stop Worrying And ... er ... Accept Caché ObjectScript