Ever since reading the Sprite Tricks articles over at codetapper.com, I've wanted to try out a different effect that used horizontal sprite multiplexing. I was particularly intrigued by the 'Risky Woods' method of multiplexing, where two sprite channels got their position register updated every 16 pixels to allow for a 15 colour, 64 pixel repeating pattern background. Reading that article, it occurred to me it might be possible to do a different and potentially more interesting effect.

And that effect is a 3 colour sprite based background without the need for a repeating pattern.

Free Form Sprite Layer
Stacks Image 2282
Above: testing the Free Form Sprite Layer program in WinUAE
The basic idea is this: the Copper & Sprite hardware don't just allow for repositioning a sprite horizontally, they also allow the sprite data to be updated. This effectively means you can have a sprite channel repeat somewhere to the right of the first display but with a different image. The catch is that there need to be at least 24 pixels* between the original sprite and it's copy for the Copper to be able to update the registers in time.

Now, 24 pixels is too many to allow a single sprite channel to cover the entire screen. However, there are 8 sprite channels. And if the Copper updates sprite data/positions as quickly as it can, it might be able to redisplay/reload enough sprites to cover the entire display. With this information and some quick back of the envelope calculation, I came up with a working configuration.
  • Method
  • Scrolling
  • Performance
  • Memory Use
  • Calculations
Tab 1
It starts by displaying all 8 sprites right next to each other using sprite DMA (covering 128 pixels) and have the copper start repositioning/reloading data as soon as the first sprite is being displayed. By the time pixel 128 is reached, the copper has repositioned/reloaded data for five additional sprites (meaning the screen now displays 208 sprite pixels side by side, where every pixel can be different).

In the time between pixel 128 and 208, a further 3 sprites can be repositioned/reloaded.The screen is now covered up to pixel 256. In the time between pixel 208 and 256, the Copper will have managed to reposition/reload another 2 sprites. At this point, the screen is covered by 288 sprite pixels. Between pixel 256 and 288 the Copper can reload/reposition a further 2 sprites (thanks to 288 being divisible by 24) and thus 320 pixels can be covered by sprite data.

An animation might make this more clear, so I’ve included a simplified display of what happens below.

CopperBeamRace
Above: the Copper racing the beam to display more sprites.
It's still possible to reload/reposition even more sprites at this point, but I stopped as the screen size I've elected to use only required 288 or 304 pixels of sprite data to be shown to fully fill it.

Note here that the repositioning/reloading needs to be repeated on every scanline showing this effect and that each line needs to end with the copper repositioning all 8 sprites back to their original positions at the left of the screen.
Tab 2
In order to scroll the sprite layer, there are two basic options. The first is changing the content of the sprite data by a shifted image every time the background layer moves. The second is updating the position of each sprite in the sprite layer and updating the image data only using non-shifted tiles. Of these two, the second has a key advantage over the first - it allows for spreading of the workload over more frames than the first does.

There are two elements to this method. The first is updating the X position of each sprite. Each sprite's position is comprised of two parts: the SPRxCTL* registers control whether a sprite's X coordinate is odd or even and the SPRxPOS* registers control the sprite's X position in increments of 2 pixels. The SPRxCTL register can be set separately from the SPRxPOS register, so it's possible to update a sprite's X position by one pixel by only accessing one of the sprite control registers instead of two.

The background layer scroll slower than the foreground layer (it scrolls 1 pixel every other frame).

This means that there are four frames between a SPRxPOS update:

        SPRxCTL    SPRxPOS
Frame 0    0          8
Frame 1    0          8
Frame 2    1          8
Frame 3    1          8
Frame 4    0          7
... etc

The program takes advantage of this by using a second copperlist to update while the first is being displayed. This allows the update of the sprite positions to be spread over four frames. There are 19 sprites shown on screen (288/16=18, plus 1 for shifting). Therefore, the program updates 4,75 columns (=1064 words) worth of sprite positions in the copper list each frame. It uses the blitter in clear mode (with the clear pattern set to the new value of SPRxPOS) to update each column.

After four frames, the copperlist is updated and is ready for display and the program swaps the two displayed list and the just updated list.

The second element of the update method is updating the sprite data. Updating the sprite data is more work than updating the sprite positions. Not only is there more data to update (38 columns vs 19 columns) but the data used is in tile format so many more blits are needed. To cope with this higher workload, the program uses another two copper lists. This doubles the workload - apart from the DMA sprites, which do not need four copies.

The grand total of columns that need to updates is therefore 22*2 (copperlist columns)+16 (DMA sprite)=60 columns. However, since sprites are 16 pixels wide and only one pixel is scrolled every other frame, there are 32 frames in total to update the two copper lists plus DMA sprites. As a result of this, the program updates 1,875 colummns (=420 words) worth of sprite data in a copperlist or DMA sprite each frame.

After 32 frames are shown, the two copperlists that were just updated are swapped over with the two that where shown before and the cycle starts again. Note here that the update of sprite position needs to start 4 frames earlier than the new copperlists are swapped in.

Frame 28: update sprite positions for first copperlist in new pair (1/4)
Frame 29: update sprite positions for first copperlist in new pair (2/4)
Frame 30: update sprite positions for first copperlist in new pair (3/4)
Frame 31: update sprite positions for first copperlist in new pair (4/4)
Frame 32: show new copperlist pair

All this put together allows for a free form scrolling sprite layer with new tiles added as required.

*) Note that SPRxCTL and SPRxPOS actually control more than just the X position, but at this point all that is interesting is updating the X position.
Tab 3
The performance of this method is not that great as the Copper list requires a ton of DMA time to update all the sprites. However, the updating of the sprite data for scrolling is surprisingly cheap because it can be spread out over a 32 frame period (the background layer moves 1 pixel every other frame).

The total DMA cycles cost of the effect is directly proportional to the number of lines the effect is in use. For each line of the effect the Copper requires 85 DMA cycles (1 wait + 41 moves per line) and the Blitter copies 19 words over 4 frames, plus 22 words over 32 frames* (=6 words per frame / 12 DMA cycles per frame). This totals to 97 DMA cycles per scanline of the effect.

At the chosen resolution of the effect (304x224), the program can draw 9 bobs (32x32) in a frame (or 10 if the effect doesn't need to scroll). If the effect is not displayed at all, the expected number of bobs would be closer to 19**. If a standard 128 pixel repeating pattern sprite layer would be displayed instead, the expected number of bobs per frame would be around 14**. Lastly, a 224 pixel high 'Risky Woods' effect would allow for about 11 bobs and the actual screen setup as used in 'Risky Woods' would allow for about 13 bobs**

Should this effect be changed to use same resolution and screen split size as is used in Risky Woods (i.e. a 'game screen' of 304x176 pixels and a 'score panel' of 288x48 pixels), the number of bobs that can be shown using this effect increases to 12**.

Note: all numbers of bobs shown assume a very basic program - such as the effect demo program I've provided. Adding true game or demo logic WILL lower these numbers (but it is difficult to say by how much as this all depends on the logic overhead).

*) 19 words to update the sprite positions, 22 words for updating the sprite data for the 11 sprites that aren't covered by sprite DMA fetches.

**) See the 'calculations' tab for all the details if desired, the 'short' version follows:

Sprite layer effect costs
A standard sprite layer (224 pixels high) costs 8736 DMA cycles.
A 'Risky Woods' sprite layer (224 pixels high) costs 16800 DMA cycles.
A 'Risky Woods' sprite layer (176 pixels high) costs 13200 DMA cycles.

A static Free Form sprite layer (224 pixels high) costs 19040 DMA cycles.
A scrolling Free Form sprite layer (224 pixels high) costs 21728 DMA cycles.
A scrolling Free Form sprite layer (176 pixels high) costs 17072 DMA cycles.

Number of 32x32 bobs shown per frame
Normal screen (no effects): 50320/2304=21*0.9=19 bobs
Standard sprite layer: 46736-8736=38000 DMA cycles => 38000/2304=16*0.9=14 bobs
'Risky Woods' sprite layer: 52240-16800=29936 DMA cycles => 29936/2304=12*0.9=11 bobs
'Risky Woods' actual game screen: 47696-13200=34496 DMA cycles => 34496/2304=14*0.9=13 bobs

Static Free Form sprite layer: 46736-19040=27696 DMA cycles => 27696/2304=12*0.9=10 bobs
Scrolling Free Form sprite layer: 46736-21728=25008 DMA cycles => 25008/2304=10*0.9=9 bobs
Scrolling Free Form sprite layer (smaller screen): 47696-17072=32352 DMA cycles => 32352/2304=14*0.9=12 bobs
Tab 4
It should be noted that this effect can be very memory hungry, especially for the scrolling version as presented here*. In order to enhance performance, no less than four copperlists are used to create this effect. The effect also needs 8 sprites of 224 pixels. This memory overhead is in addition to the normal requirements for a tripple buffer playfield. Sizes per element are as follows:

Each sprite will take up (224*4)+8=904 bytes. All sprites together will take 7232 bytes. They are double buffered, so in total 14464 bytes are used up by the DMA sprites.
Each copper list will take up bytes (224*42)*4+512=38144 bytes**. All four together will take 152576 bytes.

This totals to 167040 bytes, or 163kb for the effect.

It's worth noting that memory use (just like performance) is directly proportional to the size of the effect. For example, lowering the effect from 224 lines to 176 lines will mean saving 34,5kb (the effect then requiring 128,5kb).

Even so, in my opinion this effect is best suited for machines with at least 1mb of RAM installed. The example I provided does run with 512kb of chipram total memory, but squeezing in the remainder of a game or demo might be tough in the free RAM left.

*) A static version will only need one copper list instead of four and as such is a lot less memory intensive.
**) The 512 bytes is what I count as overhead for setting all non-sprite layer elements in the copper list (such as colours, bitplane pointers, etc).
Tab 5
This tab shows the detailed calculations I used to arrive at the numbers of bobs that can be expected to be shown on screen, given a particular size and type of effect. If you're not interested in a big bunch of numbers, this tab is best skipped.

System DMA available
The Amiga has 226*312=70512 DMA cycles available per frame. Of these, 2304 are used up by refresh and audio, leaving 68208 DMA cycles for use by the rest of the system.

Bitplane & Sprite DMA costs
Each plane costs 1 DMA cycle per word shown. Thus, the cost for displaying a bitmap is width/16*height*depth DMA cycles. Each DMA sprite channel costs 2 DMA cycles per line, or 16 DMA cycles per line for all 8 sprites.

Displaying 176 lines of sprites costs 16*176=2816 DMA cycles
Displaying 224 lines of sprites costs 16*224=3584 DMA cycles
Displaying a 304x224x4 bitmap costs (304/16)*224*4=17024 DMA cycles
Displaying a 304x176x4 bitmap costs (304/16)*176*4=13376 DMA cycles
Displaying a 288x16x3 bitmap costs (288/16)*16*3=864 DMA cycles
Displaying a 288x16x3 bitmap costs (288/16)*48*3=2592 DMA cycles
Displaying a 288x16x3 bitmap costs (288/16)*48*5=4320 DMA cycles

A 304x224x4 screen with a 288x16x3 'score panel' and no sprites uses 17888 DMA cycles, leaving 50320.
A 304x224x4 screen with a 288x16x3 'score panel' and full sprite DMA uses 21472 DMA cycles, leaving 46736.
A 304x176x4 screen with a 288x48x3 'score panel' and full sprite DMA uses 18784 DMA cycles, leaving 52240.
A 304x176x4 screen with a 288x48x5 'score panel' and full sprite DMA uses 20512 DMA cycles, leaving 47696.

Blitter bob cost
The blitter takes 2 DMA cycles per word copied and 4 DMA cycles per word cookie-cut blit. Blitting a bob requires one additional word per line copied to allow for shifting.
Blitting a 32x32x4 bob (with restore from 3rd buffer) therefore costs 3 words*32 line*4 planes*6 DMA cycles = 2304 DMA cycles.

Blitter efficiency is around 90% - around 10% is lost due to Blitter setup and other overhead.

Copper cost
One Copper WAIT takes 3 DMA cycles and one Copper MOVE takes 2 DMA cycles.

Standard sprite layer = 1 wait + 18 moves per line = 39 DMA cycles per line*224    =  8736 DMA cycles total
Risky Woods sprite layer = 1 wait + 36 moves per line = 75 DMA cycles per line*224 = 16800 DMA cycles total
Risky Woods sprite layer@176 lines = 1 wait + 36 moves per line = 75 DMA cycles per line*224 = 13200 DMA cycles total
Freeform sprite layer = 1 wait + 19 moves (reposition) + 11*2 moves (new data) = 1 wait + 41 moves = 85 DMA cycles per line*224 = 19040 DMA cycles total
Freeform sprite layer/scrolling = 85 DMA cycles (Copper) + 12 DMA cyles (Blitter) = 97 DMA cycles per line*224 = 21728 DMA cycles total
Freeform sprite layer/scrolling@176 lines = 97 DMA cycles per line*176          = 17072 DMA cycles total

As a result of the above, we reach the following number of 32x32 bobs shown per frame

Normal screen (no effects): 50320/2304=21*0.9=19 bobs
Standard sprite layer: 46736-8736=38000 DMA cycles => 38000/2304=16*0.9=14 bobs
'Risky Woods' sprite layer: 52240-16800=29936 DMA cycles => 29936/2304=12*0.9=11 bobs
'Risky Woods' actual game screen: 47696-13200=34496 DMA cycles => 34496/2304=14*0.9=13 bobs

Static Free Form sprite layer: 46736-19040=27696 DMA cycles => 27696/2304=12*0.9=10 bobs
Scrolling Free Form sprite layer: 46736-21728=25008 DMA cycles => 25008/2304=10*0.9=9 bobs
Scrolling Free Form sprite layer (smaller screen): 47696-17072=32352 DMA cycles => 32352/2304=14*0.9=12 bobs
Once I verified that this configuration worked, I updated the program to allow for the background layer to scroll (see the 'Scrolling' tab) and added a scrolling foreground layer. Lastly, I added some bobs to display on top to finish the example. Overall, I believe this method allows for impressive looking dual layered screens, especially as it allows for a 15 colour foreground with a 4 colour background - which I feel is a nicer split than Dual Playfield.

Note that the foreground tiles are (quite obviously) not my own design and are only used to show the effect in action.

Also note: it is still possible to add copper reloading of some palette entries as well. I did not do this (mainly to simplify the background layer update routines and because the effect is quite expensive as is), but there is actually enough DMA time left to change some or all of the sprite colours each scanline.
Above: the Free Form Sprite Layer program explained and shown in action.
Now, there is one elephant in the room here - this effect takes a lot of raster time and as such not all games/demos might be able to use it equally well. However, there are several things that can be done to improve the situation. Here are a few examples of ways to improve performance:

  • Do the 'Risky Woods' trick of keeping effect/screen size down somewhat
  • Mix the effect with a band of 'standard' sprite background layer
  • Have a part of the game/demo area permanently covered with foreground graphics (like most stages in Rygar)
  • Have a band without any effect at all (for instance, keep the top clear, like most stages in Rygar)
  • Have an area of the screen where the background layer is made up of hardware sprites without any repositioning (such as moving clouds <= 128 pixels wide)
Given all the pros and cons, I personally believe this is an interesting effect that could be used to improve the visuals of at least some games. All code, apart from the Startup code (which was made by Photon of Scoopex) was written by me and is (C) 2018 Jeroen Knoester. Foreground tiles are obviously not mine.

That said, please do use any part of my code or this idea you find useful. A credit/mention would be nice but is not required in any way. The program, source code and a bootable .ADF can be found in the downloads section.

If you have any questions, be sure to contact me through the contact form!

*) For OCS/ECS systems running in lores 4 bitplanes or less. I have not looked into an AGA version, but considering 64 bit bitplane fetch possibilities it is probably possible to update one sprite every 24 pixels up to 8 bitplanes/lores.