Why is there no AI inside?

SQL Tran does not use AI, and for a very good reason.

AI, and by AI for the purposes of this discussion we are talking about LLMs such as ChatGPT, is great for the areas it excels in. However, the incredible hype around AI inevitable resulted in AI being used for things where its negatives become a problem. Code translation is one of the areas where AI is not a good choice, even though at a first glance it seems as AI is doing a pretty good job.

Reliability

AI works by indexing a lot of content available in books, whitepapers, and Internet. The way it answers your question is it has an index of most-likely tokens (parts of the words, not really syllables but fairly close) following a list of existing tokens. Sounds simple, but such index is built on a massive multidimensional matrix, on an unimaginably large scale. Now, it turns out that what we perceive as a great answer is something AI picks as not the most likely token to follow, but fairly likely. We give it a level of creativeness in picking the next token. This means not only we will always get a different answer to your question but that it is also creative in figuring out what comes next. LLM does not understand the question - but the output is extremely persuasive even when wrong because it is trained on huge text index and sentences (and code) are extremely believable. It's great to have a creative essay on a non-critical topic. A creative code snippet for a translation of your business logic, though? The generated code absolutely will introduce subtle and not so subtle errors. Having careless code processing your business transactions is... not a good idea.

On the contrary, SQL Tran has a fully deterministic engine. For the same input it will always reliably produce the exact same output. In case there is a problem we can pinpoint the exact cause, fix it, write tests such that it cannot occur in the future, ever again. And even better, SQL Tran can understand the code and generate many tests to ensure it is correct.

Performance

LLMs are not parallelizable. The AI engine builds the output tokens by generating one, then generating the next one based on the first one, then generating the next one based on the first two, and so on. You can't use GPUs to speed that up because GPUs actually have large latency and make up for it with enormous parallelism. There is no parallelism for LLMs so CPUs (or some future dedicated low-latency hardware) is the only thing that can be used. The options one can use on-prem are very limited, as is the quality of such LLMs.

Our parsing, static analysis, and translation engine is written from the ground-up, not relying on existing solution. Our parsing engine is much faster than any other parser on the market (with perhaps exception of some hand-written parsers). It is millions of times faster than any AI translation can possibly be. We parse millions of lines of code per second. We do a full parsing+static analysis+translation pass of a million lines of code script in seconds.

Assessment

With SQL Tran you get assessment out of the box. The way we do assessment is non-conventional - we simply translate everything and give you all the stats you can possibly need, as well as a detailed list of errors, warnings, and incompatibilities.

Cost

ChatGPT 4 training cost a staggering $63 million. Upcoming larger models will cost even more. That cost is absolutely transferred to the end-users in the form of API usage cost. Cost is per-token so the larger your prompt, the more costly it gets. The more context you provide for your prompt, the more expensive it gets.

Privacy

As mentioned in the Cost section, training of top of the line LLM models is extremely expensive. It is not feasible for anyone but the largest players in the space to do so and they won't be providing those models for customers to be used offline. This means that you will absolutely need to send your code to the cloud services for processing. Some companies are OK with accepting the privacy guarantees of cloud vendors, many are not - as there is no way to check the compliance.

SQL Tran is deployed on your own Azure tenant. Spectral Core has no access to your Azure resources. We have no way to access your code. We have no way to touch your SQL Tran instance. We have no way to touch your databases. We cannot even push a new version of the application as we have no access at all. (To upgrade, you will export your projects to encrypted file, publish a brand new instance of SQL Tran, import from encrypted file, then delete the old instance.) SQL Tran does connect to Spectral Core licensing service with minimal data needed to charge you for your usage.

Testing

AI cannot test your code for correctness, but we can. More precisely, we can verify that the translated code is doing exactly the same thing as your source code. Your bugs remain, we faithfully translate them to the new platform!

The way we test your code is as follows.

Tables, Views

  • We read all records from source and target staging database (in parallel, for performance) and compare each column value in each record.

  • In case there are differences, you will be able to view the first 1000 differences in a special grid with every difference clearly marked. Some minor differences are not important and are a side-effect of the different architecture between your source and target database engine.

Procedures, Functions

  • Each procedure (and function) is executed, then we issue a rollback. But, before we do a rollback, we compare a whole bunch of things, as described next.

  • In/out and out parameters are all compared

  • Affected tables (ones where our static analyzer reports you are writing to using insert, update, delete, or merge) are compared using our high-performance parallel engine in the same way we compare tables and views.

  • If you have SELECT statements sprinkled between your body statements (as is very often done for debugging and verification during development), we will compare all such datasets as well.

Triggers

  • Triggers are not verified separately - their impact is tested by testing procedures and functions .

Further alternatives

Besides AI and a full parsing and static analysis engine that we use, the third class of database code translation tools use regular expressions to transform the code. In our opinion, that's not a very serious approach as it introduces whole classes of issues, such as, but not limited to:

  • Differences in expression precedence rules. You will get wrong results of the calculations because of it.

  • Low quality of the translation, as this approach can do limited transformation

  • Transfer of complexity to the end-users (informing users they can extend thousands of regular expression rules with their own is, in our opinion, a point against such solutions and not for them)

  • No way to do any serious customization of name or type mapping

Last updated