Deciphering Scarface CSO Scripts

Introduction
Hello everyone, I'm going to start this post with a question, how many of you have played Scarface: The World is Yours from 2006?
Yeah, I know it's a bit old game, but personally it gave me several hours of fun in front of the TV (PS2).
With the game in hand for the computer version, combined with my studies in reverse engineering, I decided that I would try to decode the game's script files.

Why ?
Simple, because there isn't even documentation about Scarface scripts on the internet.
What I discovered might be useful someday to kick-start game modding.
Not to mention my huge desire to create a script for the player to jump, yes, JUMP.
The game's Tony Montana doesn't know how to jump, he only knows how to walk, shoot, drive, fly a speedboat, but he doesn't know how to JUMP (how angry aaarrghhh).

Discoveries
Game script files are in .DSO format and others in .CSO format
The first time I opened one of the scripts, I was pretty sure it was a LUA script, the structure was very similar, so an excitement took over me, I just had to use LuaDec and the problem was solved, right? Just not! After a few minutes of analysis, I noticed some small differences in structure and discarded this possibility.

Below is a snippet of one of the scripts.


So what could I do? I'm not very familiar with old game formats, I was having a hard time understanding the format, there was nothing about game development on the web... so there was only one thing left for me, to know the story! From there, everything started to make more sense.
I got to know more about the producer (Radical Entertainment), past games and found out that they actually used an engine that at the time was a reference for game creation: the Torque Engine. This engine was made open source in 2012.

Soon, by association, it became clear that the game's scripts were actually written in Torque Script, more precisely in the "C Script" language, which is very similar to the C language we know.

Excellent, knowing the language it was time to study about it and try to find ways to decipher it.
Lucky for me, I found 2 articles that talked about how to decode the scripts for 2 games: Frozen Synapse and Think Tanks.
These two articles were essential for me to learn the file format, especially the first article.

I was very happy, but that happiness was short-lived, once again...
There was no compatibility between the 2 types of "C Script", despite the format being the same.
For example reading 4 bytes on one, on the other it was 2 bytes.
The count for one should be multiplied by 8 while the other should be multiplied by 4.
The order of the string and float tables was changed.
Several bytes were removed, among others.
All this made the use of existing decompilers unfeasible, even between the two games (Frozen and Tanks) there was no compatibility whatsoever.

Anyway, now I just had to face the problem, and from what I already had in hand, try to decrypt the Scarface files on my own.
I spent days and days trying to really understand how the codes were organized and finally, after many mistakes and few successes, I reached a conclusion, in which I am 99.999999% sure.

The File Format
Below you can find the organization of the game scripts. In format: [ X bytes - Description ]
  • 4 bytes: Script Version (It's always 1)
  • 4 bytes: GS Table Size
  • X bytes: GS Table
  • 4 bytes: GF Table Size
  • X bytes: GF Table
  • 4 bytes: FS Table Size
  • X bytes: FS Table
  • 4 bytes: FF Table Size
  • X bytes: FF Table
  • 4 bytes: Code Size
  • X bytes: Code
  • 4 bytes: Size of the IDENT Table
// For each item in the IDENT Table we will have:
  • 4 bytes: Offset
  • 4 bytes: Count
  • 4 bytes * Count: Locations

Variables
  • GS: Global Strings
  • GF: Global Floats
  • FS: Function Strings
  • FF: Function Floats

To better understand what all these parameters mean, I recommend you read the first article I left above. There it is explained in more detail, and in order not to make the post too long, I leave this homework for you.

Each string of GS, GF, FS, FF Tables are separated by null bytes 0x00 (or 00)

For Code, reading is done byte by byte, taking into account its size.
For each byte 0xFF (or FF) found, we must read the next 2 bytes as if it were a single byte.
There are cases where we will find 2 FF bytes in a row and this must be taken into account, for example:
In the byte sequence 48 00 FF FF 04 4D 33 34 there are 2 FF bytes in a row.
In this case, we must disregard the reading of the 2 bytes followed by the first FF byte and read the two bytes followed by the other FF in which we would read 04 4D. I don't know to what extent this can influence the reading of the code during decompilation, for the other scripts I didn't notice cases of FF in a row, nor treatments in case that happened.

If you read the articles I left above, you can see that there are some differences in the formats of each game...
Instead of reading 2 bytes for every FF byte in Scarface Code, you should read 4 bytes in Frozen Synapse...
There is no linebreak_count and no linebreak_pairs for Scarface scripts.

Byte mapping of one of the scripts
In the example below, you can see how the bytes of a script (small in size) are divided.

What's next?
In this thread I'm putting together all my discoveries about the script format as well as finding ways to decode it.
I don't guarantee that I'll be able to, that's why I'm creating this topic, to leave all my discoveries about the script and if I can't get someone else with more experience in the subject to be able to get it, at least a solid base of knowledge about the format will do meet.

Scarface is an incredible game, it has its flaws, but even so the qualities are greater, in addition to remembering our dear GTA.
There is no modding community for it, and unfortunately it is falling into oblivion limbo.
The possibilities are immense for this game. From my research, the native Torque Script would allow the creation of thousands of mods, both for the ease of writing their codes and for the vast documentation it has.

UPDATE - A decompiler tool has been made!

1 Comments

  1. This is just amazing. I wish you could multiply your efforts and community for doing this. Will be a total dev milestone. And will keep lot (I can asure that) of users playing that game again, looking for how to hack it up all around.
    Thumbs up to this, hope you keep doing it

    ReplyDelete

Post a Comment