Getting Started with the Apache Derby Open Source Database in CF8+

Last Updated: Jun 08, 2012

If you've not noticed, one of the many new features added in CF8 was the Apache Derby database, embedded within CF. Derby is a full-featured database with a 10+ year heritage (formerly known as Cloudscape, and released to open source by IBM, who had acquired it from Informix, and also now distributed by Oracle--formerly as Sun--as JavaDB.)

The point, with respect to CF, is that this DB is now embedded within CF, from CF8-onward. It can either be run within the CF/JVM process or it can be run out-of-process. Whatever you may find discussed about Java DB (or formerly Cloudscape) applies to Derby. As I'll discuss below, this Derby database is far more robust than most would presume (for a "free, embedded database"). It's not just a toy, and it's not a "single-user db". More below.
In addition to the materials below, you may also want to view the hour-long recorded presentation I gave at Max 2008, "Using Apache Derby: the open source DB embedded in CF 8". For more info, see this blog entry.

Here are topics in this resource list:

Intro: Why Consider Derby at All?

For those not familiar with the Apache family of open-source projects, don't confuse this with the web server of the same name (which is just one of many, many open source projects hosted on the apache.org site.)

I'm sure as CF folks start exploring Apache Derby they'll want to know more. I've pulled together this page to serve as a resource to help others. If you'd like to make any additions or corrections, let me know.

Derby is fully transactional, secure, easy-to-use, standards-based — SQL, JDBC API, and Java EE — yet small, only 2MB. It has the advanced features you'd expect in any quality DBMS: stored procedures, triggers, views, referential integrity constraints (including cascading deletes), multi-user capable, cost-based query optimization, transactions, savepoints, schemas, encryption, and much more. It can even run entirely read-only off of a CD.

The Apache Derby project has a strong and growing community that includes developers from large companies such as Oracle/Sun Microsystems and IBM as well as individual contributors.

One of my main motivations is simply to get people to start talking about the possibilities that it opens, being embedded within CF. I allude to a couple possibilities in the "why use Derby" question below. In fact, I've added a new section to the bottom to track any clever ideas others relate on how they're using Derby within CF.

General Resources for Learning More about Derby

Here are some resources to learn more about Derby: If you know of some other cool resources (especially not listed within the sites above), feel free to share.

As mentioned in the opening introduction, I also did a presentation at Max 2008, "Using Apache Derby: the open source DB embedded in CF 8". You can view the hour-long recorded presentation. More info here.

And of course, as you use Derby with CF, please do chime in with your experiences on the Derby section of ColdFusionCommunity.org.

Is it a development-only DB? How does it perform and scale?

No, it's not just a development DB. First, yes, of course you can build production applications with it (though redistribution would be according to the Apache license). But as for its performance capability, there's in fact a PDF of a presentation comparing Derby to MySQL and others as well as Performance Analysis of Apache Derby. Still another is available from IBM. See the next section on clustering.

Can Derby be used in a Cluster?

While not inherent in Derby as deployed with ColdFusion, since Derby is an open source project with strong community support, there are solutions that have been created to support clustering of Derby. Some of the projects to look at are:
  • C-JDBC, "open source (LGPL) database cluster middleware that allows any Java application (standalone application, servlet or EJB container, ...) to transparently access a cluster of databases through JDBC(tm). The database is distributed and replicated among several nodes and C-JDBC balances the queries among these nodes. C-JDBC handles node failures and provides support for checkpointing and hot recovery."

    Here are a couple of resources specifically on clustering Derby with c-jdbc:
  • Sequoia, a continuation of c-jdbc (with Apache license)
  • Tungsten, a coming stack of scale-out services, built atop Sequoia

Is this the same database engine that's embedded in Adobe Air?

No, that's yet another open source (indeed, public domain) DB, called SQLite (not a typo: it's spelled with one "L"). One may ask why the different Adobe teams chose different open-source embedded DBMSs. It could be that the different groups weren't aware of each other's decision and simply chose what seemed best for them. (Is Derby embedded in CF the same way SQLite is in Air? Well, that seems semantics. Yes, Derby is embedded in CF. The full DBMS is there. Nothing to install. Yes, you have to create a DSN (more later), but that's it.)

Could Derby have been used for Air? Sure. Could SQLite have been used as the embedded DB for CF? It seems so. The two are very similar in being small, embedded, yet highly functional multiuser database engines.

There's a comparison of the two on the SQLite site, though obviously it has a SQLite perspective. For instance, it says "Derby only allows a single process to have the database open at a time in its embedded mode. However, Derby also offers a full client/server mode." That's a common misconception, which I'll address in the next question.

I've heard that Derby is a single-user DBMS

That's a common misconception, and some of it stems from the fact that it can run on its own, embedded in another Java application or server (like ColdFusion), or it can operate using another feature called the Derby Network Server.

Let's look first at the simpler embedded form: Derby itself provides no communications abilities of its own, which helps keep it lightweight. As such, it can't be communicated with from outside applications, indeed it can accept requests from within that same JVM process. Does that mean it's single-user? Of course not. CF (like most web app servers) is multi-user, and IT handles that multi-user processing. Derby itself has no problem processing multiple requests within the CF process. It just can't receive them from outside of CF.

So why might a CF user care about the Network Server feature? Well, if you wanted to talk to Derby from another Java process, then you need it. You may think this applies only to other Java applications on the server or clients, but here's one that may trip you up: an IDE. The ij tool (mentioned later), for example, can't talk to your embedded database without enabling the Network Server.

That latter point, and indeed more on this whole question of "is it a multi-user database or not?" is covered in Chapter 1 of the Derby book I'd mentioned, whih has a corresponding web site including that chapter 1 as a pdf. (It's also available in in HTML form elsewhere. Note it's 11 "pages" that you need to proceed through.)

The discussion of interest here is on the next to last page. I'd like to quote a bit:
When developers refer to Apache Derby as an embeddable database, they are referring to the fact that the Apache Derby database runs within a JVM process. Without the Apache Derby network server, there would be no networking services, data access outside of the embedded JDBC driver in the database engine, or other infrastructure requirements; this accounts for its small footprint.

Understanding what the embedded concept entails is critical when developing applications. For example, one common misconception that developers have when they work with Apache Derby as a standalone database is that it’s only a single-user database and does not have communication capabilities. They believe that it is a single-user, single-connection, single-threaded system and develop their applications accordingly. This is not true. Apache Derby as a standalone database can support as many connections as desired, so long as they are established from the same JVM hosting the Apache Derby engine.

For an Apache Derby database to be accessed from a process that resides outside the hosting JVM that loaded the Apache Derby database initially (even if the JVM process resides on the same server), you need to load the Apache Derby network server. Read that last sentence twice to ensure you understand it because it is often a source of confusion for Apache Derby developers when multiple JVMs reside on the same machine. The Apache Derby network server allows for communications between JVM processes. This means that this communication infrastructure isn’t solely required to communicate between machines; it is needed even if two different JVM processes reside on the same machine and want to talk to the same database.

How can we use it/enable it in ColdFusion?

It's built into CF8, which means both that it shows up as an option in the drivers list when adding a new datasource (in the CF Admin), and also the database engine itself if embedded within CF8. Nothing to install. Just start using it. In fact, the CF8 example DBs are now provided in Derby, no longer Access.

Do note, however, that if you setup a new datasource in the CF8 admin, you will have 2 choices: "Apache Derby Client" and "Apache Derby Embedded". Which should you use, and how and why? The embedded version is simplest, as we'll see in a moment, and works fine even for use by very large multi-user applications running on CF. But if you want to be able to access the database from tools outside of CF, such as an IDE, you will need to use the Client mode and run what's called the Derby Network Server as well (which is simple). (Note, however, that you do NOT need to use the Client mode to be able to access Derby databases if you use an IDE that uses CF's RDS feature, such as in Eclipse with the Adobe CF8 Extensions, or from Dreamweaver, HomeSite+, or CF Studio. More on this in the next section.)

First, about the Embedded form, if you want use one of those CF 8 sample datasources just mentioned, or you choose to set up a DSN in the CF Admin that uses the "Embedded" form of a datasource (versus the "Client" form), there's nothing more you need to do to be able to use the datasource. As long as CF is up, its embedded copy of Derby will be able to respond to requests, and you can access the datasource using CFQUERY or any other CFML database code.

On the other hand, if you do want to set up a DSN in the CF Admin that uses the "Client" form of a datasource (versus "Embedded"), you will also need to start the Derby Network Server. You can learn more on that at the Apache docs on setting up Derby Network Server. You can start up the Network Derby server in CF by running: java -jar derbynet.jar start from within the directory where the derbynet.jar exists in a CF installation. In the standalone version of CF, that might typically be c:\coldfusion8\lib\. In a multiserver (multiinstance) implementation of CF, for the cfusion instance, as an example, that may be something like C:\JRun4\servers\cfusion\cfusion-ear\cfusion-war\WEB-INF\cfusion\lib\.

Once the server is started, you'll be able to create or verify a "client" derby datasource in the CF Admin.

Are there any IDE or Query tools for talking to Derby?

As you consider this question (or consider the answers of others) be careful to recognize that people may mean different things. Also, it depends as well on whether you are running Derby in "embedded" or "client" (or network) mode. More on that in a moment.

What do you really mean by an IDE or query tool?

First, someone may mean they want to access Derby from within a favored IDE they already use (like Eclipse or Dreamweaver), while others may not care and will be happy to use a separately downloaded tool. Some will want to use such a downloaded tool that they may already have, but want to know how to use THAT tool with Derby, while others will be happy to use any tool that helps solve the problem, even if new to them.

Second, and perhaps most important, recognize that when people speak of IDEs for a database, they may have very different needs in mind. Different tools will meet these needs at different levels of success, if at all. Are you looking to be able:
  • to easily see the databases in a server, and the available tables and columns in a database, and so on?
  • to see the data in a given table?
  • to be able to enter SQL statements manually and see the results of their executions?
  • visually build SQL query and update (DML, or data manipulation language) statements? perhaps with the tool showing available relationships between tables so as to build needed join clauses automatically?
  • generate CFC code from a table, to perform CRUD (create, read, update, delete) operations, that you can call from any code?
  • build complete HTML or Flex interfaces to perform CRUD operations?
  • visually build a new databases, tables, columns, relationships, etc (generating DDL, or data definition language, statements)?
With that introduction, different tools exist to talk to Derby, with varying levels of the above functionality.

What do you really want in terms of IDE intergration or a query tool?

At a bare minimum, you may be look for something that lets you look at a Derby database: see its tables and columns, view its data, and build SQL query/update statements. Some others let you build CFC code from tables. Still others also let you generate DDL to visually create new databases, tables, etc.

But a bigger question before we go much further is what sort of IDE/editor or query tool do you want to use? That will make a big difference in how things can be expected to work.

Do you want to connect to Derby using an IDE/editor designed to work with CF, like DW, CFB, CFE, CFS?

First up, let's talk first to those who use an IDE/editor designed to work with CF, whether ColdFusion Builder, or Eclipse with the Adobe CF 8 extensions for Eclipse, or Dreamweaver, HomeSite+, or the retired CF Studio. All these offer features to view databases and their tables and columns, view data, execute and generate SQL DML.

These tools can talk to any datasource defined within the CF Admin, and in the case of Derby databases, that's regardless of whether it's defined in the Admin as running in "embedded" mode (where Derby is running inside of the JVM along with CF) or "client" mode (where Derby runs in its own address space). When we talk about other IDEs and query tools, that will be a very different issue. More on that in a moment.

Before leaving the subject of CF-oriented IDEs/editors, note that none of them offer any visual tools to create new tables, columns, etc (DDL). Both the Adobe extensions for Eclipse and Dreamweaver have features to generate CFCs from tables, and the Eclipse extensions support generation of CFC code from a table, to perform CRUD (create, read, update, delete) operations.

For anyone who may not yet know about the query features of these editors and how to setup RDS in them and for your CF server, I'm afraid both are beyond the scope of this discussion. I can point you (and others here) to a recorded presentation I did just last month on all that: Leveraging RDS in Eclipse, DW, and HS+: Secure, Useful. You can also see a PDF of a talk I did a few years ago: CF RDS: From Stress to Success.

Do you want to connect to Derby from any other sort of IDE or query tool?

If you want to use any other sort of IDE or query tool (than those discussed above), note that these will ONLY work if you have Derby running in network mode, meaning that you have a separately running Derby instance. Since this page is oriented toward CF developers, what I mean here is that you must have defined the datasource in the CF Admin as being in "Client" mode (rather than "embedded").

The issue is that if the Derby DB you want to talk to is running within CF in "embedded" mode, it means that the Derby process is running inside the JVM. As long as CF is up and running, it will be effectively "holding a lock" on any such embedded datasource.

If instead it's running in network mode, then you will have to configure the IDE/query tool to point to the database as it is running on that separately configured and started implementation of Derby. (You could also simply try to point to the Derby DB by embedding it within the IDE or query tool, if that's an option, but only if CF is not holding a lock on it by being started and pointing to it as "embedded".

So what other IDE/query tools do you want to use to connect to Derby?

If you use Eclipse, you have several choices: There is also an open source query tool devoted to Derby called Squirrel, and a commercial tool called RazorSQL, which is not expensive and I have succeeded in using with Derby (and does provide DDL support). I see that DBVisualizer lists specific Derby support and does support visual creation of DDL.

Finally, any tool that supports JDBC will work. For instance, AquaData Studio doesn't list Derby directly but they do list support for "Generic JDBC Platform", and I've even found a couple of references to it perhaps supporting Derby. I list several other query tools at a new CF411 resource page I have with a section on Query tools. So does the free NetBeans IDE

There is also the DataMgr tool, which now has support for Derby. See other discussions.

I should add, finally, that there are also tools that are not specifically IDEs or Query tools, but are code generators that work with CF and databases, such as those I list in my Resource page on Code generators.

But again, this needs to be made very clear (as I hope I've done so above): all these other IDEs/query tools can generally ONLY work when the Derby database is running in what ColdFusion refers to as "network" mode, meaning Derby is running as its own process, and the tools are told to point to that, or if it's possible for the tool to point to the DB directly in the file system, that Derby DB can't be defined as an "embedded" datasource in an instance of CF that's running, as it will hold a lock on that.

Why use Derby database verses using something like MySQL?

Well, let me remind you of the resource I mentioned above comparing Derby to MySQL.

I think the primary benefit would be for those developing (or using) a CF app built to use Derby, where the developer can know that the DB will be there (as long as on CF8), and on any platform. Same for the users: they don't need to "install" a database. Sure, for many of us, this is trivial, and we like the freedom to choose among available DBMSs, but anyone developing packaged apps can take advantage of this.

Similarly, the CF8 example apps are distributed using it. That was Adobe taking advantage of the same benefit. I'm sure, over time, there will be still more uses.

For instance, I hope that Adobe might consider changing the default client storage from the Registry to a Derby DB (if not cookie). Certainly shops could choose it manually, in the meantime (though, sure, if they already have another DBMS installed they may prefer to use that for familiarity.) And clearly some shops would prefer anyway to create a different DSN per app for such a client var repository. I'm just saying it could be a matter of picking an app-specific one presumed to be Derby (without need for the user to create a new Derby database and DSN), assuming for some reason cookie might not be a better default. Just a thought.

There may be many other useful benefits from our having a DBMS embedded in CF, not the least of which is for those who are building apps to be deployed elsewhere. This fits very nicely with the WAR/EAR deployment mechanism of CF Enterprise (though that can just as well use any DBMS, of course.)

Beyond that, as for whether and why to use Derby over other DBMSs, I'd leave that debate for others to make. Certainly the Derby site will have many such articles making the case, and a google search for derby and mysql will turn up more. Just to clarify, I didn't create this page so much to evangelize the DB and convince people to use it, so much as just to create a place share resources with those exploring it now that it's included in CF8.

Where can I find a reference for supported SQL, etc.?

Both the Apache and IBM sites have complete reference documentation. See the Apache site's version of the SQL statement reference, Support for SQL-92 features, and Table of Contents.



Each of these resources have lots more documentation than what I've highlighted. See the tables of contents for more.

There's also a mapping of various levels of SQL-99 and SQL-2003 features mapped to Derby. This is part of a larger Derby wiki with lots of useful info.

Is there a way to automate creation of a new Derby Database in CF?

Well, do you really mean creating a database or a DSN? I'll assume you really mean the former (in addition to the latter). Both can be done via CFML.

Since CF7, we've had the Admin API which includes methods for creating DSNs, and which could therefore pass in the attribute needed to create a new database while also creating a DSN. See the CF docs which offer examples of using the Admin API to create a datasource, and you can apply the same technique to a Derby DSN.

Beyond that, though, Sam Farmer has released DerbyCFC, to assist in listing, creating Derby databases in CF.

Where else can I learn more from other CFers using Derby?

There was an article in one of the last issues of the CFDJ before it became defunct: Working with the Apache Derby Database and ColdFusion, by Chip Temm, though note that it came out before the final release of CF8, so the hassle of creating a new embedded DB (using create=true on the connection string) is no longer needed. There's now a checkbox in the DSN setup, but that's not the case if creating a Client DSN, where you must still use the connection string if creating a new database. He explains how to run the network version (if using a Client DSN). There's just one mistake. He says to start it with "ava -jar /ColdFusion8/lib/derbynet.jar -start". That should just be "start", not "-start". And if you're running with the multiserver mode, the derbynet jar will be deep in your instance, such as C:\JRun4\servers\cfusion\cfusion-ear\cfusion-war\WEB-INF\cfusion\lib.

Scott Stroz has also done a talk on it, Using Apache Derby - The built-in CF8 Database.

There have also been a few CF/Derby-oriented blog entries (some of which also were before the final release): There's even an article on using Derby with CF 7.02 (written before it was included in CF 8).

Finally let me point out that I've started an Derby community on the ColdFusionCommunity.org site. There you'll find people discussing Apache Derby from a CFML perspective. I'll note, however, that I've taken all the discussions that preceded December 2007 and turned them into this blog entry.

What are some of the clever ways others are using Derby in CF8?

I shared above a couple of ideas of how I can foresee folks using the embedded Derby database, but what about you? What clever ways are you (or might you consider) using it?
  • As a client variable repository (Charlie Arehart)
  • To create a packaged CF app in a WAR/EAR file, whereby no DBMS needs to be installed in the destination server (Charlie Arehart)
  • (your name here)
Send me your ideas or any other suggestions you have for this resource.




Managed Hosting Services provided by
http://www.edgewebhosting.net/
Managed Dedicated Hosting