Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write a BMP→array converter for hardcoded 1bpp sprites? #8

Closed
nmlgc opened this issue Jun 21, 2020 · 22 comments
Closed

Write a BMP→array converter for hardcoded 1bpp sprites? #8

nmlgc opened this issue Jun 21, 2020 · 22 comments
Assignees
Labels
enhancement help wanted Good issues for outside contributors.

Comments

@nmlgc
Copy link
Owner

nmlgc commented Jun 21, 2020

TH01 pellets are coming up next, and for the first time, we'll have the chance to move hardcoded sprite data from ASM land to C land. This raises the question of when we are going to do this The Right And Moddable Way, by auto-converting actual image files into ASM or C arrays during the build process. This converter would also automatically generate pre-shifted versions of any sprites that need them, without requiring modders to store the 8 versions of those sprites manually.

Such a minor thing is currently not covered by any of the subscriptions, probably to this missing option in the order form. Since it requires no RE skills and also collides with nothing else, it would be the perfect thing to be done by an outside contributor who is merely familiar with C/C++… if we still have any left, that is.
Any backers with outstanding contributions in the backlog might also choose to put their contribution towards that, in case they think that this is more valuable than more reverse-engineering at this point. Otherwise, I'll just hardcode TH01's pellets directly, for the time being.

Since TH05's gaiji are also stored in this hardcoded 1bpp form, I also consider this converter as a prerequisite for fully rebuilding TH05's ZUN.COM, as mentioned in the 2020-02-23 blog post. As it turns out though, the OMF .OBJ output mentioned in that post won't be necessary after all. Most of these sprite declarations lie in the middle of a translation unit anyway, and therefore need to be #included. Same for ASM land.


Some more, unsorted details I had in mind - everything here is still up for discussion though:

  • Ideally, we'd only have to read 1bpp ("monochrome") .BMP files, as saved by MS Paint – any non-1bpp formats are handled by other, non-hardcoded image formats, and are out of scope here. However, Paint is pretty much deprecated, and has a quite nasty rendering bug in 1bpp mode on my system:

    Paint 1bpp sprite bug
    (It's the exact spark sprite sheet from below, but some dots just aren't rendered?)

    It's also been getting more and more difficult to save this format from other tools. Therefore, the converter should also accept all other common uncompressed .BMP bit depths – but, of course, error out whenever any color other than #000000 or #ffffff is found.

  • This should simultaneously compile into a 16- and 32-bit binary, right from the start. The 32-bit build step already requires the ASM arrays to be present, as does the 16-bit build step for C arrays.

  • Yes, technically it wouldn't be necessary to compile this into a 16-bit Real Mode DOS program, since we could already output the C arrays during ReC98's 32-bit build step. However, people are looking forward to the eventual 16-bit-only build process, and I agree that it makes no sense to have the PC-98-only master depend on a 32-bit C++ compiler just for things like these.

  • Therefore, aim for maximum portability right from the start - the converter should compile with anything from Turbo C++ 4.0J up to modern Visual Studio and GCC/Clang compilers, so that future port authors have one less thing to worry about. Now, I could force C89 onto people now, but I really don't see why I should 🙂

    ReC98 itself is going to use the freeware Borland C++ 5.5 (from the infamous freecommandLinetools.exe) as its default 32-bit Windows compiler; I've just added that one to the DevKit. Turbo C++ 4.0J defines the subset of allowed C++ features anyway, and BCC55 still happens to be the most hassle- and bloat-free way to get any sort of 32-bit Windows C++ compiler to people. (Open Watcom unfortunately only comes second here.)

  • Rely on platform.h in the top level of this repo, adding new things there as necessary. For modern, sane compilers, we'd ideally only need to #include <stdint.h> there? I hope.

  • For integration into the future build system, the converter should be written as a library with a C/C++ API.

  • My original plan would have involved parsing sprite dimensions from a C header, recognizing the dots*_t types, as well as macros like PRESHIFT to figure out where to put the pre-shifted data, or *_H to indicate the height. However, that would not only significantly raise the complexity of this, it also can't communicate all necessary information: TH03's in-game score sprites have their dots stored upside down, but we'd obviously like to have them the right way around in the .BMP file. (Not to mention the fact that TH05's zun -G is written completely in ASM and doesn't need a C header anyway.)

  • 💡 And then you notice that having a C++ build system allows you to just #include the sprite declarations from their actual header, and then pass the necessary numbers to the converter via sizeof(). So, the converter is free to only output raw numbers for the array definition. Adding per-dimension initializer lists will catch even more errors at compile-time, and mzdiff will catch the rest 🙂

  • 💡 And then you notice that this doesn't depend on the build system at all. We can simply use the converter's main() function to call the conversion API for every sprite sheet in ReC98, with "hardcoded" paths, and run that from the Makefile. Then, we don't even need to design a command-line frontend.

  • No further dependencies – yes, no master.lib, but also no win32_utf8. The only encoding-sensitive call should be fopen(), so let's just handle Win32 Unicode via an overload for wchar_t.

  • The dots*_t types limit the width of individual sprites to 8, 16, or 32 dots, you don't need to support any other sprite widths. Pre-shifted sprites can even be restricted to 8 dots.

  • Inside the .BMP files, cels can be arranged in any left-to-right, top-to-bottom order. Their individual positions are then derived according to the dimensions of the input file. An example from TH05's gaiji sprite sheet:

    • Sprite size: 16×16
    • Sprite count: 256
    • Input .BMP size: 256×256
    • → Top-left corner of sprite #174 (0xAE) is at position ((0xAE % 16) * 16, ⌊0xAE / 16⌋ * 16) = (224, 160)
  • Preliminary file extensions for the generated files could be .csp and .asp. The future build system mandates that those are different from regular .cpp or .asm, because that's how it selects the compiler. Having distinct extensions also works well with .gitignore.

  • Whoever writes it gets to name it. (Within reason, of course…) 6 letters max, since we need to suffix it with 16 and 32 – if a directory contains both a .COM and a .EXE with the same name, even modern 64-bit Windows runs the .COM binary by default…

  • And, of course, it should output the obligatory // Generated by ??? from ???.bmp, do not modify directly comment in the first line of the file.


Currently, the converter would have to work correctly with the following sprites, and output the same arrays we currently got in the repo:

Game .BMP file C declaration / data layout Note
1 TH01 pellets dots16_t sPELLET[3][PRESHIFT][8];
2/4/5 TH02 pellets dots16_t sPELLET[PRESHIFT][8]; Reusing TH01's bitmap is probably not worth the complication of somehow getting rid of TH01's two decay sprites, which are no longer used in later games?
2/4/5 TH02 sparks dots16_t sSPARKS[PRESHIFT][8][8];
2 TH02 point numerals dots16_t sPOINTNUMS[13][8]; Yes, there's a 13th empty 8×8 sprite at the end. It's actually used in place of the multiplier, if the multiplier would be ×1. :zunpet:
3 TH03 score font dots8_t sSCORE[10][8]; Emitted data needs to be stored upside-down.
4/5 TH04 gray bottom part of pellets dots16_t sPELLET_BOTTOM[PRESHIFT][4]; Bottom gray part of pellets.
4/5 TH04 point numerals dots16_t sPOINTNUMS[12][PRESHIFT][8]; Also not worth reusing the first 10 sprites of TH02's bitmap.
5 TH05 gaiji dots16_t sGAIJI[256][16];
5 TH05 piano label font dots8_t sPIANO_LABEL[9][8];

(Technically, we'd also have to add the piano sprites for TH05's Music Room, but these are stored and used in such a compressed way that it defeats the purpose of storing them as bitmaps.)

@nmlgc nmlgc added enhancement help wanted Good issues for outside contributors. labels Jun 21, 2020
nmlgc added a commit that referenced this issue Jun 21, 2020
A future sprite converter (documented in #8) could then convert these
to C or ASM arrays.

(Except for the piano sprites for TH05's Music Room, which are stored
and used in such a compressed way that it defeats the purpose of
storing them as bitmaps.
@joncampbell123
Copy link
Contributor

I think the GIMP (GNU Image Manipulation Program) can still properly emit 1-bit bitmaps. Try that.

@joncampbell123
Copy link
Contributor

joncampbell123 commented Jun 21, 2020

As for reading bitmaps, BMP files are easy to write a parser for. They start with a BITMAPFILEHEADER (https://docs.microsoft.com/en-us/windows/win32/api/wingdi/ns-wingdi-bitmapfileheader) followed by a BITMAPINFO header. (https://docs.microsoft.com/en-us/windows/win32/api/wingdi/ns-wingdi-bitmapinfo)

The bitmap bits are at the offset given in the file header.

Bytes per scanline are rounded up to a DWORD in anything higher than 1bpp, or byte (8 bits) for 1bpp.

Bitmaps are stored with the scanlines upside down (file order is bottom to top).

@joncampbell123
Copy link
Contributor

I think I can start a bitmap to C unsigned char array converter. It will be in DOSLIB and I will provide a link to it.

@joncampbell123
Copy link
Contributor

joncampbell123 commented Jun 26, 2020

Initial quick work done.

The code loads the bitmap into memory. If the bit depth is 24 or 32, then it is converted to 1bpp on load.

The conversion is managed through a task struct, since, you said you wanted a library-like form you can incorporate if needed.

For debugging purposes, the only output right now is emitting the loaded bitmap back to a BMP file.

Currently compiles for Linux (makefile) and Open Watcom 16-bit and 32-bit MS-DOS.

https://github.com/joncampbell123/doslib/tree/master/tool/rec98/touhou

@nmlgc nmlgc assigned nmlgc and joncampbell123 and unassigned nmlgc Jun 26, 2020
@joncampbell123
Copy link
Contributor

BMP/BIN/C/ASM output working. No PRESHIFT yet. Code seems stable so far. OMF output will be last, of course.

https://github.com/joncampbell123/doslib/tree/master/tool/rec98/touhou

@joncampbell123
Copy link
Contributor

PRESHIFT done, at least the case where the [PRESHIFT] dimension is inner.

@joncampbell123
Copy link
Contributor

PRESHIFT inner and outer is done. No OMF yet.

@nmlgc
Copy link
Owner Author

nmlgc commented Jun 27, 2020

Great work so far, thanks a lot! Wouldn't even have thought about potential big-endian hosts myself. Just got a couple of points before this can be merged:

  1. <sys/types.h> and <sys/stat.h> do not seem to be used?
  2. <unistd.h> is not available on any Borland or Visual Studio compiler; those have the unbuffered I/O functions in <io.h>. The least complex option there would probably just be to use buffered I/O for .BMP files as well?
  3. It would be nice to make the debug output to stderr optional, via a verbosity flag.
  4. For library usage, could you move parse_argv() and main() to a separate translation unit? This will probably be easier to handle in build systems than making the function definitions optional via a new #define.
  5. Error handling could be improved a bit. Each failure case should at least come with an error message on stderr. Returning distinct error codes for each of them would be great, but not too necessary, since I'd like to have the error message output as part of the library as well.
  6. For ReC98, OMF output will at best be nice to have, and at worst not worth the complexity of (probably?) including DOSLIB's OMF library. So you really don't need to work on it, as far as ReC98 is concerned. But in case you'd still like to have this feature for yourself, could you make sure that I can still merge the code without it?
  7. Unicode on Windows… is probably not worth the trouble after all, giving that ReC98 is only ever run this with relative ASCII paths anyway. It would be manageable for the file opening calls by #defineing a custom TCHAR type depending on the _WIN32 and _UNICODE macros, we'd still have to convert the file name to some 8-bit encoding for the header comment in the
    file. So just going with chars is also fine.
  8. Got a few signedness warnings from Borland C++ 5.5:
    Warning W8012 bmp2arr.c 377: Comparing signed and unsigned values in function saveout_write_sprite
    Warning W8012 bmp2arr.c 402: Comparing signed and unsigned values in function saveout_write_sprite
    Warning W8012 bmp2arr.c 422: Comparing signed and unsigned values in function saveout_write_sprite
    Warning W8012 bmp2arr.c 478: Comparing signed and unsigned values in function rec98_bmp2arr_save_output
    Warning W8012 bmp2arr.c 523: Comparing signed and unsigned values in function rec98_bmp2arr_save_output
    Warning W8012 bmp2arr.c 563: Comparing signed and unsigned values in function rec98_bmp2arr_save_output
    Warning W8012 bmp2arr.c 656: Comparing signed and unsigned values in function rec98_bmp2arr_load_bitmap
    Warning W8012 bmp2arr.c 665: Comparing signed and unsigned values in function rec98_bmp2arr_load_bitmap
    

@joncampbell123
Copy link
Contributor

  1. Borland C++ is wrong. Those are comparisons of unsigned char vs unsigned int. Nothing is signed there.

@joncampbell123
Copy link
Contributor

  1. I'll go ahead and remove the OMF option.

@joncampbell123
Copy link
Contributor

  1. sys/types.h and sys/stat.h are needed for some of the functions when compiled with GCC on Linux.
  2. Another option is to add an #if defined(_MSC_VER) || defined(BORLAND_C) and an #include needed by those, then the #else can include what works with GCC and Open Watcom.

@joncampbell123
Copy link
Contributor

  1. The context needs an "enable debug to stderr", yes.
  2. The idea is that, if you call this as a library, you'd call the functions or set the members of the struct yourself, and wouldn't need the command line parsing.
  3. Agreed. I'll have it update a const char * pointer to an error string.

@joncampbell123
Copy link
Contributor

joncampbell123 commented Jun 28, 2020

  1. It would only make code more complex to support both char and wchar_t in the same code, agreed.

@nmlgc
Copy link
Owner Author

nmlgc commented Jul 3, 2020

  1. Wow, Borland C++ also complains about *dst++ = tmp:
    Warning W8004 bmp2arr.c 170: 'dst' is assigned a value that is never used in function memcpy24to1
    Warning W8004 bmp2arr.c 204: 'dst' is assigned a value that is never used in function memcpy32to1
    
    Yeah, no point in working around those issues.

  1. Yes, but the easiest way to add this code would be to simply add bmp2arr.c as a separate translation unit, without going through a separate library build phase. So we need some way to include this code without also defining a main().

I'll be getting the TH01 pellet pushes done this weekend, and ideally, I'd merge this before. Do you have the time to complete the rest within the next few days, or would that have to wait a bit longer?

@joncampbell123
Copy link
Contributor

If you mean separate bmp2arr.c into a program and library, sure. Does the library need to be a .lib file or is a .c file ok?

@nmlgc
Copy link
Owner Author

nmlgc commented Jul 4, 2020

Yes, that's what I meant. A .c file is all we need.

@joncampbell123
Copy link
Contributor

Sorry for the long delay, it's done. bmp2arr.c is the program, bmp2arrl.c is the library. bmp2arrl.h is the header for the library.

@nmlgc
Copy link
Owner Author

nmlgc commented Jul 8, 2020

Thank you! Having this tool actually turned out to be a lot more necessary for the upcoming TH01 pellet work than I thought, so I'm going to merge this by tomorrow, even without the error handling in place. That would still be nice to have, though – keeping the issue open until then.

@nmlgc
Copy link
Owner Author

nmlgc commented Jul 8, 2020

Found two more things:

  • __BORLANDC__ needs to be __TURBOC__, which is #defined on both "Borland" and "Turbo" editions.
  • Casting the malloc() return values to unsigned char * would also make the code compile in C++ mode.

@nmlgc
Copy link
Owner Author

nmlgc commented Jul 8, 2020

  • Bundling preshift and preshift_inner into an enum (PRESHIFT_NONE = 0, PRESHIFT_INNER, PRESHIFT_OUTER) would make initializer list declarations a bit nicer to use.

@joncampbell123
Copy link
Contributor

joncampbell123 commented Jul 9, 2020

The code now looks for either __BORLANDC__ or __TURBOC__, and malloc/free have C++ friendly typecasting.

nmlgc added a commit that referenced this issue Jul 9, 2020
@nmlgc nmlgc closed this as completed in ec86682 Sep 7, 2020
@nmlgc
Copy link
Owner Author

nmlgc commented Sep 7, 2020

Implemented the rest as part of P0113.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement help wanted Good issues for outside contributors.
Projects
None yet
Development

No branches or pull requests

2 participants