# TrueReality - RSP documentation                                       #
# Copyright (C) 1999 Niki W. Waibel                                     #
#                                                                       #
# This program is free software; you can redistribute it and/           #
# or modify it under the terms of the GNU General Public Li-            #
# cence as published by the Free Software Foundation; either            #
# version 2 of the Licence, or any later version.                       #
#                                                                       #
# This program is distributed in the hope that it will be use-          #
# ful, but WITHOUT ANY WARRANTY; without even the implied war-          #
# ranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.         #
# See the GNU General Public Licence for more details.                  #
#                                                                       #
# You should have received a copy of the GNU General Public             #
# Licence along with this program; if not, write to the Free            #
# Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139,         #
# USA.                                                                  #
#                                                                       #
# Information about me (the author):                                    #
#   Niki W. Waibel, Reichenau 20, 6890 Lustenau, Austria - EUROPE       #
#   niki.waibel@gmx.net                                                 #


=========================================================================
 This is some sort of the RSP (Reality Signal Processor?) documentation.
=========================================================================


All of this stuff was figured out by disassembling/emulating various demos
with TrueReality.
All things which I'm not sure about are marked with a '?'.
All opcodes marked with an '*' were not figured out by myself. Currently I
do not know if the author wants to be named.





General things:
 o The RSP is a some sort of a R4000 cpu with less instructions.
   There are for example no 64bit instructions.
   BUT!, there are some important ;) additional vector instructions!!!
   See sections 'Registers' / 'Instruction Map'.

 o Special COP0 / COP2.
   No COP1 (FPU).
   See sections 'COP0' / 'COP2'.

 o The RSP has 4K instruction and 4K data memory which is mapped into the
   N64 adress space:
    - data mem (DMEM):        0x04000000 - 0x04000fff
    - instruction mem (IMEM): 0x04001000 - 0x04001fff

 o There are 32 vector (128bit) registers which are divided into 8 elements.

 o There are 4? (16bit?) flags

 o There are 8? hidden? 32bit? accumulators.





Registers:
        o 32 General Purpose Registers (GPR):  32bit? each
                Work as in a "normal" MIPS cpu.
                For example the register
                R00
                is always zero.

        o  4? flags (COP2 CCR):                16bit? each
                - flag[0]:
                        BITS: fedcba9876543210
                              zzzzzzzzcccccccc
                        0-7: These seem to be some sort of CARRY BITS for each
                             element.
                        f-8: These seem to be some sort of ZERO BITS for each
                             element. 0 means zero.
                - flag[1]:
                        BITS: fedcba9876543210
                              ????????cccccccc
                        0-7: These seem to be some sort of COMPARISON BITs for
                        each element.
                - flag[2]:
                        ?
                - flag[3]:
                        ?

        o 32 vector registers:                128bit each
                These are divided into 8 units/elements each.
                See section 'Elements' for more info.

        o ... ???





Elements:

        v ...         Vector (128bit)
        h ...    Half vector ( 64bit)
        q ... Quarter vector ( 32bit)
        w ...          Word? ( 16bit)

        RS: The 5th bit is always set!
            Instruction Map (COP2 rs (bits 25-21)) setion
            will explain that!!!

         RS      Elem      Elements used
        ----    ------    ---------------
         10        []     0 1 2 3 4 5 6 7
         11?       []?    0 1 2 3 4 5 6 7?
         12      [0q]     0 0 2 2 4 4 6 6
         13      [1q]     1 1 3 3 5 5 7 7
         14      [0h]     0 0 0 0 4 4 4 4
         15      [1h]     1 1 1 1 5 5 5 5
         16      [2h]     2 2 2 2 6 6 6 6
         17      [3h]     3 3 3 3 7 7 7 7
         18       [0]     0 0 0 0 0 0 0 0
         19       [1]     1 1 1 1 1 1 1 1
         1a       [2]     2 2 2 2 2 2 2 2
         1b       [3]     3 3 3 3 3 3 3 3
         1c       [4]     4 4 4 4 4 4 4 4
         1d       [5]     5 5 5 5 5 5 5 5
         1e       [6]     6 6 6 6 6 6 6 6
         1f       [7]     7 7 7 7 7 7 7 7





COP0:
    GPR   N64 Register
    ---   ------------
      0   SP memory address
      1   SP DRAM DMA address
      2   SP read DMA length
      3   SP write DMA length
      4   SP status
      5   SP DMA full
      6   SP DMA busy
      7   SP semaphore
      8   DP CMD DMA start
      9   DP CMD DMA end
     10   DP CMD DMA current
     11   DP CMD status
     12   DP clock counter
     13   DP buffer busy counter
     14   DP pipe busy counter
     15   DP TMEM load counter
    As you might know these registers are also mapped in the N64 address space.
    With cop0 you can access them.
    So DMA transfers from/to DMEM/IMEM can be done by CPU or by RSP.





COP2:
    You can access the vector registers via MFC2/MTC2 and
    the flags via CFC2/CTC2.
    All vector instructions are cop2 functions.





Instruction Map:

 Opcode (bits 31-26):

  /========================================================================\ 
 ||   28-26|   0   |   1   |   2   |   3   |   4   |   5   |   6   |   7   ||
 ||=31-29=/+===============================================================||
 ||   0   ||Special|RegIMM |   J   |  JAL  |  BEQ  |  BNE  | BLEZ  | BGTZ  ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   1   || ADDI  | ADDIU | SLTI  | SLTIU | ANDI  |  ORI  | XORI  |  LUI  ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   2   || COP0  |  ---  | COP2  |  ---  |  ---? |  ---? |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   3   ||  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   4   ||  LB   |  LH   |  ---? |  LW   |  LBU  |  LHU  |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   5   ||  SB   |  SH   |  ---? |  SW   |  ---? |  ---? |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   6   ||  ---? |  ---? | LWC2  |  ---? |  ---? |  ---? |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   7   ||  ---? |  ---? | SWC2  |  ---? |  ---? |  ---? |  ---? |  ---? ||
  \========================================================================/ 

 SPECIAL function (bits 5-0):

  /========================================================================\
 ||     2-0|   0   |   1   |   2   |   3   |   4   |   5   |   6   |   7   ||
 ||==5-3==/+===============================================================||
 ||   0   ||  SLL  |  ---? |  SRL  |  SRA  | SLLV  |  ---? | SRLV  | SRAV  ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   1   ||  JR   | JALR  |  ---? |  ---? |  ---? | BREAK |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   2   ||  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   3   ||  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   4   ||  ADD  | ADDU  |  SUB  | SUBU  |  AND  |  OR   |  XOR  |  NOR  ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   5   ||  ---? |  ---? |  SLT  | SLTU  |  ---? |  ---? |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   6   ||  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   7   ||  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? ||
  \========================================================================/

 REGIMM rt (bits 20-16):

  /========================================================================\
 ||   18-16|   0   |   1   |   2   |   3   |   4   |   5   |   6   |   7   ||
 ||=20-19=/+===============================================================||
 ||   0   || BLTZ  | BGEZ  |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   1   ||  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   2   ||BLTZAL |BGEZAL |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   3   ||  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? ||
  \========================================================================/

 COP0 rs (bits 25-21):

  /========================================================================\
 ||   23-21|   0   |   1   |   2   |   3   |   4   |   5   |   6   |   7   ||
 ||=25-24=/+===============================================================||
 ||   0   || MFC0  |  ---? |  ---? |  ---? | MTC0  |  ---? |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   1   ||  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   2   ||  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   3   ||  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? ||
  \========================================================================/

 COP2 rs (bits 25-21):

  /========================================================================\
 ||   23-21|   0   |   1   |   2   |   3   |   4   |   5   |   6   |   7   ||
 ||=25-24=/+===============================================================||
 ||   0   || MFC2  |  ---? | CFC2  |  ---? | MTC0  |  ---? | CTC2  |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   1   ||  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   2   ||VECTOP |VECTOP |VECTOP |VECTOP |VECTOP |VECTOP |VECTOP |VECTOP ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   3   ||VECTOP |VECTOP |VECTOP |VECTOP |VECTOP |VECTOP |VECTOP |VECTOP ||
  \========================================================================/

 VECTOP function (bits 5-0):

  /========================================================================\
 ||     2-0|   0   |   1   |   2   |   3   |   4   |   5   |   6   |   7   ||
 ||==5-3==/+===============================================================||
 ||   0   || VMULF | VMULU | VRNDP | VMULQ | VMUDL | VMUDM | VMUDN | VMUDH ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   1   || VMACF | VMACU | VRNDN | VMACQ | VMADL | VMADM | VMADN | VMADH ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   2   || VADD  | VSUB  | VSUT? | VABS  | VADDC | VSUBC | VADDB | VSUBB ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   3   || VACCB | VSUCB | VSAD  | VSAC  | VSUM  | VSAW  |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   4   ||  VLT  |  VEQ  |  VNE  |  VGE  |  VCL  |  VCH  |  VCR  | VMRG  ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   5   || VAND  | VNAND |  VOR  | VNOR  | VXOR  | VNXOR |  ---? |  ---? ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   6   || VRCP  | VRCPL | VRCPH | VMOV  | VRSQ  | VRSQL | VRSQH | VNOOP ||
 ||-------||-------+-------+-------+-------+-------+-------+-------+-------||
 ||   7   || VEXTT | VEXTQ | VEXTN |  ---? | VINST | VINSQ | VINSN |  ---? ||
  \========================================================================/





Instructions:
        [b_el]  Selects a byte (8bit) from a 128 bit vector.
                It always has to be a value between 0-15?. So I do a
                [b_el] &= 0x0f;
                Even: High portion of element
                Odd:  Low portion of element
                Example: v3[2] refers to the following byte of v3.
                         v3: xxxx XXxx xxxx xxxx xxxx xxxx xxxx xxxx
        [el]    See section 'Elements' above!
        [h_el]  Selects a 1 element (16bit) from a 128 bit vector.
                It always has to be a value between 0-7?. So I do a
                [h_el] &= 0x07;
                Example: v3[2] refers to the following element of v3.
                         v3: xxxx xxxx XXXX xxxx xxxx xxxx xxxx xxxx
        v       One of the 32 (128bit) vectors (source or destination).
        vs      The source vector.
        vs1     The first source vector.
        vs2     The 2nd source vector.
        vd      The destination vector.



    Lets start!




        VABS vd, vs1, vs2[el]
        Vector ABSolute?  
        +-----------+---------+---------+---------+---------+-----------+
        |0 1 0 0 1 0|1 x x x x|x x x x x|x x x x x|x x x x x|0 1 0 0 1 1|
        +-----------+---------+---------+---------+---------+-----------+
         COP2        VECTOP el vs2       vs1       vd        VSUB

                VABS changes the sign of vs2[el] depending on vs1 and stores it
                in vd.
                vs1 > 0: do not change
                vs1 < 0: change sign
                vs1 = 0: set result in vd to zero
                ATTENSION:
                        If the sign of 0x8000 is changed it results in a 0x7fff
                        instead of 0x8000!
                        That is because 0x8000 is normally not defined as a
                        value in signed 16 bit numbers. The RCP does not care
                        about that.



        VADD vd, vs1, vs2[el]
        Vector ADDition
        +-----------+---------+---------+---------+---------+-----------+
        |0 1 0 0 1 0|1 x x x x|x x x x x|x x x x x|x x x x x|0 1 0 0 0 0|
        +-----------+---------+---------+---------+---------+-----------+
         COP2        VECTOP el vs2       vs1       vd        VADD

                VADD takes vs1, adds it to vs2[el] (in a signed way!) and
                stores the result in vd.
                If the CARRY BIT is set in flag[0] it adds 1 to that
                element additionally.
                Then it checks if over/underflow has occured.
                If true then the result in vd clamps to 0x7fff/0x8000.
                It clears all the CARRY BITs?.



        VADDC vd, vs1, vs2[el]
        Vector ADDition and set Carry
        +-----------+---------+---------+---------+---------+-----------+
        |0 1 0 0 1 0|1 x x x x|x x x x x|x x x x x|x x x x x|0 1 0 1 0 0|
        +-----------+---------+---------+---------+---------+-----------+
         COP2        VECTOP el vs2       vs1       vd        VADD

                VADDC takes vs1, adds it to vs2[el] (in an unsigned way!) and
                stores the result in vd.
                It sets the CARRY BITs if over/underflow has occured.
                It does not care about the CARRY BITs when calculating the
                result.



        VEQ vd, vs1, vs2[el]
        compare Vector if EQal
        +-----------+---------+---------+---------+---------+-----------+
        |0 1 0 0 1 0|1 x x x x|x x x x x|x x x x x|x x x x x|1 0 0 0 0 1|
        +-----------+---------+---------+---------+---------+-----------+
         COP2        VECTOP el vs2       vs1       vd        VEQ

                VEQ checks if (vs1 == vs2[el]).
                If true the COMPARISON BIT is set to 1 otherwise it is
                cleared.
                If a ZERO BIT is set it clears that COMPARISON BIT anyway.
                The CARRY BITs are not used?.
                vs2[el] is copied to vd.
                flag[0] (CARRY and ZERO BITs) is cleared.



        VGE vd, vs1, vs2[el]
        compare Vector if Greater or Equal
        +-----------+---------+---------+---------+---------+-----------+
        |0 1 0 0 1 0|1 x x x x|x x x x x|x x x x x|x x x x x|1 0 0 0 1 1|
        +-----------+---------+---------+---------+---------+-----------+
         COP2        VECTOP el vs2       vs1       vd        VGE

                VGE checks if (vs1 >= vs2[el]) (in an signed way!).
                If true the COMPARISON BIT is set to 1 otherwise it is
                cleared.
                If a CARRY BIT is set AND! if that element of vs1 and vs2[el]
                are equal it clears that COMPARISON BIT anyway.
                The ZERO BITs are not used?.
                The elements of vd are taken from vs1 or vs2[el] depending on
                the COMPARISON BITs.
                If a COMPARISON BIT is set to 1 vs1 is taken otherwise vs2[el]
                is taken.
                flag[0] (CARRY and ZERO BITs) is cleared.



        VLT vd, vs1, vs2[el]
        compare Vector if Less Than
        +-----------+---------+---------+---------+---------+-----------+
        |0 1 0 0 1 0|1 x x x x|x x x x x|x x x x x|x x x x x|1 0 0 0 0 0|
        +-----------+---------+---------+---------+---------+-----------+
         COP2        VECTOP el vs2       vs1       vd        VLT

                VLT checks if (vs1 < vs2[el]) (in an signed way!).
                If true the COMPARISON BIT is set to 1 otherwise it is
                cleared.
                If a CARRY BIT is set AND! if that element of vs1 and vs2[el]
                are equal it sets that COMPARISON BIT anyway.
                The ZERO BITs are not used?.
                The elements of vd are taken from vs1 or vs2[el] depending on
                the COMPARISON BITs.
                If a COMPARISON BIT is set to 1 vs1 is taken otherwise vs2[el]
                is taken.
                flag[0] (CARRY and ZERO BITs) is cleared.



        VMOV vd[h_el], vs[h_el]
        Vector MOVe
        +-----------+---------+---------+---------+---------+-----------+
        |0 1 0 0 1 0|1 x x x x|x x x x x|0?x x x x|x x x x x|1 1 0 0 1 1|
        +-----------+---------+---------+---------+---------+-----------+
         COP2        VECTOP el vs        [h_el]    vd        VMOV

                VMOV takes vs[h_el] and stores it in vs2[h_el].



        VMRG vd, vs1, vs2[el]
        Vector MeRGe?
        +-----------+---------+---------+---------+---------+-----------+
        |0 1 0 0 1 0|1 x x x x|x x x x x|x x x x x|x x x x x|1 0 0 1 1 1|
        +-----------+---------+---------+---------+---------+-----------+
         COP2        VECTOP el vs2       vs1       vd        VMRG

                VMRG stores the elements of vs1 or vs2[el] in vd.
                If the COMPARISON BIT is 1 it takes the element from vs1
                otherwise from vs2[el].
                The COMPARISON BITs correspond in the following way to the
                elements:
                        COMP. BIT: 7 6 5 4 3 2 1 0
                        ELEMENT  : 7 6 5 4 3 2 1 0
                Cool eh?



        *VMUDH vd, vs1, vs2[el]
        Vector MUltiplay D? High?
        +-----------+---------+---------+---------+---------+-----------+
        |0 1 0 0 1 0|1 x x x x|x x x x x|x x x x x|x x x x x|0 0 0 1 1 1|
        +-----------+---------+---------+---------+---------+-----------+
         COP2        VECTOP el vs2       vs1       vd        VMUDH

                The first element in vs1 is sign extended to a 32 bit value and
                is then multiplied to vs2[el] which is also sign extended to a
                32 bit value. If this calculated value, when combined with the
                value of 0x8000 in a bit-wise logical AND operation, equals 0
                and the most significant 16 bits of the value does not equal 0
                then the specific accumulator equals 0x7FFFFFFF. If the
                calculated value, when combined with the value of 0x8000 in a
                bit-wise logical AND operation, does not equal 0 and the most
                significant 16 bits of the value does not equal 0xFFFF then the
                the specific accumulator equals 0x80000000. If neither of this
                conditions are true then calculated value is shifted left 8
                bits inserting zero in the low-order bits, then the the
                specific accumulator equals this value. The first element of
                the register vd is equal to the upper half of the specific
                accumulator. This same operation is then repeated on the other
                7 elements and there corresponding Accumulator.  :)



        VNE vd, vs1, vs2[el]
        compare Vector if Not Eqal
        +-----------+---------+---------+---------+---------+-----------+
        |0 1 0 0 1 0|1 x x x x|x x x x x|x x x x x|x x x x x|1 0 0 0 1 0|
        +-----------+---------+---------+---------+---------+-----------+
         COP2        VECTOP el vs2       vs1       vd        VNE

                VNE checks if (vs1 != vs2[el]).
                If true the COMPARISON BIT is set to 1 otherwise it is
                cleared.
                If a ZERO BIT is set it sets that COMPARISON BIT anyway.
                The CARRY BITs are not used?.
                vs1 is copied to vd.
                flag[0] (CARRY and ZERO BITs) is cleared.



        VSUB vd, vs1, vs2[el]
        Vector SUBtraction
        +-----------+---------+---------+---------+---------+-----------+
        |0 1 0 0 1 0|1 x x x x|x x x x x|x x x x x|x x x x x|0 1 0 0 0 1|
        +-----------+---------+---------+---------+---------+-----------+
         COP2        VECTOP el vs2       vs1       vd        VSUB

                VSUB takes vs1, subs it from vs2[el] (in a signed way!) and
                stores the result in vd.
                If the CARRY BIT is set in flag[0] it subs 1 to that
                element additionally.
                Then it checks if over/underflow has occured.
                If true then the result in vd clamps to 0x7fff/0x8000.
                It clears all the CARRY BITs?.



        VSUBC vd, vs1, vs2[el]
        Vector ADDition and set Carry
        +-----------+---------+---------+---------+---------+-----------+
        |0 1 0 0 1 0|1 x x x x|x x x x x|x x x x x|x x x x x|0 1 0 1 0 1|
        +-----------+---------+---------+---------+---------+-----------+
         COP2        VECTOP el vs2       vs1       vd        VSUBC

                VSUBC takes vs1, subs it from vs2[el] (in an unsigned way!) and
                stores the result in vd.
                It sets the CARRY BITs if over/underflow has occured.
                It sets the ZERO BIT if element of vd is zero.
                It does not care about the CARRY BITs when calculating the
                result.



        LWC2
        SWC2
                The LWC2/SWC2 instructions do not work normal.
                They are split into 12? LxV and 12? SxV instructions. The RD
                field does the seperation in the following way:
                 RD    Instr    Multipl   Does
                ---- --------- --------- ---------------------------------------
                  0   LBV/SBV      1      Load/Store (  8bit (Byte)   ) Vector.
                  1   LSV/SSV      2      Load/Store ( 16bit (Short?) ) Vector.
                  2   LLV/SLV      4      Load/Store ( 32bit (Long?)  ) Vector.
                  3   LDV/SDV      8      Load/Store ( 64bit (Double?)) Vector.
                  4   LQV/SQV     16      Load/Store (128bit (Quad?)  ) Vector.
                  5   LRV/SRV     16      Load/Store (128bit (Right)  ) Vector.
                  6   LPV/SPV      8      Load/Store ( 64bit (Packet) ) Vector.
                  7   LUV/SUV      8      Load/Store ( 64bit (Upper?) ) Vector.
                  8   LHV/SHV     16      Load/Store (128bit (Half?)  ) Vector.
                  9   LFV/SFV     16      Load/Store (128bit (Fourth?)) Vector.
                  a   LWV/SWV     16      Load/Store (128bit (Warp?)  ) Vector.
                  b   LTV/STV     16      Load/Store (128bit (Transp?)) Vector.

                MIPS R4300 mnemonic:
                bits:    6      5     5     5     5     6
                        +------+-----+-----+-----+-----+------+
                        |XXXXXX|XXXXX|XXXXX|XXXXX|XXXXX|XXXXXX|
                        +------+-----+-----+-----+-----+------+
                fields:  opcode rs    rt    rd    sa    funct

                RCP LWC2/SWC2 mnemonic:
                bits:    6      5     5     5     4!!! 7!!!
                        +------+-----+-----+-----+----+-------+
                        |11x010|XXXXX|XXXXX|XXXXX|XXXX|XXXXXXX| 
                        +------+-----+-----+-----+----+-------+ 
                MIPS R4300
                fields:  opcode rs    rt    rd   sa>>1 ((sa&1)<<7)|funct 
                     x=0: LWC2  r     vd    i    el    o
                     x=1: SWC2             instr       offset value
                                           select      (has to be muliplied!)

                        L{i}V {vd}[{elem}], Z*o({r})

                        Z: This value comes from the table above ('Multipl')!

                        It's a bit complicated but once you will understand.
                        Pay attension an the offset. If bit 6 (the MSB bit) is
                        set the offset is negative. If you are a emu developer
                        then set the sign bits correct. I warned you!

                Example:
                        LQV v03[00], 16(r04)

                Calculation of the memory address for starting peeking/poking
                the dmem (0x04000000 - 0x04000fff):
                        addr = (Multipl * o) + reg
                        
                        Multipl: is from the table above (depends on RD)
                        o:       7 bit offset value (sign!!!)
                        reg:     contents of the GPR # RS

                        I will call this 'addr' and I will refer to this
                        very often in the description of the functions.

                LBV/SBV v[b_el], o(r)
                Load/Store Byte Vector
                        Loads a byte (8bit) from memory into vector v.
                        Stores a byte (8bit) from vector v into memory.

                LSV/SSV v[x], o(r)
                Load/Store Single? Vector
                        Loads a half word (16bit) from memory into vector v.
                        Stores a half word (16bit) from vector v into memory.
                        x can be 0, 2, 4, 6, 8, 10, 12, or 14? selecting the
                        starting byte in a [b_el] way.

                LLV/SLV v[x], o(r)
                Load/Store Long? Vector
                        Loads a word (32bit) from memory into vector v.
                        Stores a word (32bit) from vector v into memory.
                        x can be 0, 4, 8 or 12? selecting the starting byte
                        in a [b_el] way.

                LDV/SDV v[x], o(r)
                Load/Store Double? Vector
                        Loads a double word (64bit) from memory into vector v.
                        Stores a double word (64bit) from vector v into memory.
                        x can be 0 or 8? selecting the starting byte in a
                        [b_el] way.

                LQV/SQV v[x], o(r)
                Load/Store Quad? Vector
                        Loads max 128 bits from memory into vector v.
                        Stores max 128 bits from vector v into memory.
                        x can be 0-15? selecting the starting byte in a [b_el] 
                        way.
                        Loading/storing ends when addr if has reached a 16byte
                        boundary.
                        Example:
                                LQV v7[3], 16(r0)
                                loads
                                xxxx xxXX XXXX XXxx xxxx xxxx xxxx xxxx
                                of vector v7.

                LRV/SRV v[x], o(r)
                Load/Store Right? Vector
                        Loads max 128 bits from memory into vector v.
                        Stores max 128 bits from vector v into memory.
                        x is always 0?.
                        Loading/storing starts from the right side of the
                        vector (the 7th element).
                        Loading/storing is done for addr&0xf bytes.
                        If addr&0xf == 0 then nothing happens?.
                        Example:
                                LRV v4[0], 0(r0)
                                loads
                                xxxx xxxx xxxx xxxx xxxx xxXX XXXX XXXX
                                of vector v4.

                LPV/SPV v[x], o(r)
                Load/Store Packet? Vector
                        Loads 8 bytes from memory into vector v.
                        Stores 8 bytes from vector v into memory.
                        x is always zero?.
                        The memory is accessed byte by byte.
                        Load:
                        The bytes are shifted left by 8 bits (->16bit value)
                        and each value is stored in an element.
                        Store:
                        The elements are shifted right by 8 bits (->8bit value)
                        and each value is stored in memory.

                LUV/SUV v[x], o(r)
                Load/Store Upper? Vector
                        Loads 8 bytes from memory into vector v.
                        Stores 8 bytes from vector v into memory.
                        x is always zero?.
                        The memory is accessed byte by byte.
                        Load:
                        The bytes are shifted left by 7 bits (->16bit value)
                        and each value is stored in an element.
                        Store:
                        The elements are shifted right by 7 bits (->8bit value)
                        and each value is stored in memory.

                LHV/SHV v[x], o(r)
                Load/Store Half? Vector
                        Loads 8 bytes from memory into vector v.
                        Stores 8 bytes from vector v into memory.
                        x is always zero?.
                        The high portion of a half word (16bit) is accessed in
                        memory. These 8 bits are loaded/stored from/in bit 7-14
                        of the element in the vector register.

                LFV/SFV v[x], o(r)
                Load/Store Half? Vector
                        Same as above except that it accesses the highest byte
                        of a word (32bit) in memory.
                        That means it loads/stores every 4th byte in memory.

                LWV/SWV v[x], o(r)
                Load/Store Warp? Vector
                        x can be 0-16?.
                        Example:
                                LWV v2[6], 0(r0)
                                Mem @ 0-16 contains:
                                0011 2233 4455 6677 8899 aabb ccdd eeff
                                Vector after instruction:
                                v2: aabb ccdd eeff 0011 2233 4455 6677 8899

                                SWV v2[6], 0(r0)
                                v2: 0011 2233 4455 6677 8899 aabb ccdd eeff
                                Mem @ 0-16 contains after instruction:
                                6677 8899 aabb ccdd eeff 0011 2233 4455

                LTV/STV v[x], o(r)
                Load/Store Transpose? Vector
                        This instructions were difficult to implement in
                        TrueReality.
                        They simply operate on half words (16bit) in memory
                        and on elements (16bit) in vectors.
                        x can be 0, 2, 4, 6, 8, 10, 12, or 14? **!NOT!**
                        selecting the starting byte in a [b_el] way.
                        It is the reverse [b_el] way.
                        2 means starting at element 7.
                        4 means starting at element 6.
                        6 means starting at element 5.
                        0 means starting at element 0.
                        Load:
                        The first address in memory is addr!
                        The first half word (16bit) of memory is loaded into
                        element (7-x/2)&0x07 on vector v.
                        The next half word (16bit) of memory is loaded into
                        element (7-x/2+1)&0x7 on vector v+1.
                        ...
                        Store:
                        The first address in memory is:
                        if(x>0)
                                addr+16-x
                        else
                                addr
                        !!! That is different in LTV !!!
                        Element (7-x/2)&0x7 on vector v is stored in first half word
                        (16bit) of memory.
                        Element (7-x/2+1)&0x0 on vector v+1 is stored in the
                        next half word (16bit) of memory.
                        ...
                        ATTENSION:
                                If the memory pointer reaches the 16 byte
                                boundary it does **!NOT!** load/store the next
                                2 bytes in memory. It loads/stores the 2 bytes
                                at 'addr & ~0x0f'!
                                The next 2 bytes are at '(addr & ~0x0f) + 2',
                                etc.
                        Example (you will need it ;-) ):
                                LTV v2[6], 0(r0)
                                Mem @ 0-16 contains:
                                0011 2233 4455 6677 8899 aabb ccdd eeff

                                Vectors contain after instruction:
                                v2: xxxx xxxx xxxx xxxx xxxx 0011 xxxx xxxx
                                v3: xxxx xxxx xxxx xxxx xxxx xxxx 2233 xxxx
                                v4: xxxx xxxx xxxx xxxx xxxx xxxx xxxx 4455
                                v5: 6677 xxxx xxxx xxxx xxxx xxxx xxxx xxxx
                                v6: xxxx 8889 xxxx xxxx xxxx xxxx xxxx xxxx
                                v7: xxxx xxxx aabb xxxx xxxx xxxx xxxx xxxx
                                v8: xxxx xxxx xxxx ccdd xxxx xxxx xxxx xxxx
                                v9: xxxx xxxx xxxx xxxx eeff xxxx xxxx xxxx

                                STV v2[6], 0(r0)
                                Vectors contain:
                                v2: xxxx xxxx xxxx xxxx xxxx 0011 xxxx xxxx
                                v3: xxxx xxxx xxxx xxxx xxxx xxxx 2233 xxxx
                                v4: xxxx xxxx xxxx xxxx xxxx xxxx xxxx 4455
                                v5: 6677 xxxx xxxx xxxx xxxx xxxx xxxx xxxx
                                v6: xxxx 8899 xxxx xxxx xxxx xxxx xxxx xxxx
                                v7: xxxx xxxx aabb xxxx xxxx xxxx xxxx xxxx
                                v8: xxxx xxxx xxxx ccdd xxxx xxxx xxxx xxxx
                                v9: xxxx xxxx xxxx xxxx eeff xxxx xxxx xxxx

                                Mem @ 0-16 contains after instruction:
                                6677 8899 aabb ccdd eeff 0011 2233 4455

                                That means if you do a LTV and then the same
                                STV you will NOT get the same mem as before
                                LTV/STV!!!

                                Looks easy - but difficult to implement!!!
                        

                  ... more will come (hint: look into my source)!


 ... to be written!
