Thoughts on MidiSheetMusicMemo

最近我觉得《MidiSheetMusicMemo》有一个问题:在“问答模式”中玩一次的时间对于正常的谱子而言太冗长,以至于坚持到最后是一件很辛苦的事情。这本身并不是很大的问题,但是因为以现在的程序的状况,如果不坚持到最后,统计数据就不会更新,就不会有成就感,那就会让用户体验很不好。目前连俺作为开发者自身都没法认真玩完,对于其它用户可想而知。
就拿最近的《二部创意曲8》和《二部创意曲13》来说,虽然小节数不是很多,各有35个小节和25个小节,但是却要至少五分钟甚至是十分钟才能完成。即使这样的小曲子时间都如此之长,其原因之一可能是因为选项的个数太多:八选一从绝对数量来看是很多的,俺自己在使用时都会有一种“即使我能背下这个谱子,也不能很快地从八个选项中找到正确的选项,因为错误选项的干扰效应太强了。”的感觉。应该试试看三选一、四选一和六选一,不应该就从一开始就吊死在八选一这棵树上。
话说最近想尝试一种新的转场效果,并修复一些bug。希望年底之前有空做完吧。

发表在 Uncategorized | 留下评论

Cangjie input method mini game

It's here —> http://edgeofmap.com/cj
It's still under development.
I would want to gamify the learning process, creating a fun and rewarding experience. When I started learning 倉頡 in February I used a Win3.1 program (which was 16-bit!) called 「倉頡教室」.
It's first exercise was about the parts-keys. The purpose was to help remember which parts are on which [a-z] keys. The parts in Cangjie were divided into groups (Group 1=日月金木水火土, group 2=竹戈十大中一弓, group 3=人心手口, group 4=尸廿山女田卜) and so was the first exercise.
The second exercise teaches simple compound characters. Example: 明=日+月.
The web game has so far replicated the first exercise and part of the second exercise.
I was wondering this may be useful for not only those who're learning the input method, but also those who are learning Chinese characters, even without prior exposure to those characters.
Still this web app has a long way to go, either to becoming a fully usable Cangjie tutorial or a complete introduction to Chinese characters.

发表在 Uncategorized | 留下评论

PA9

PA9的简单的代码生成器

【确立「先把操作数读入临时寄存器、再将临时寄存器中的值写回目标操作数中」的code generator结构】

自从开始做第一个测试用例时,就会发现,开门第一件事情就是要知道会出现不同的搬运数据的情况。对于每个cy86指令,通常都有一个操作数(operand)。其可来自于寄存器([x,y,z,t][64,32,16,8])或来自内存地址([寄存器+常数] 或 [Label+常数]),或是立即数(immediate)。所以,对于每一个cy86的指令,都一定会是在以下的一个表格中的一种情况。

到哪里去\从哪里来 寄存器 内存地址 立即数
寄存器 情况1 情况2 情况3
内存地址 情况4 情况5 情况6

对于上面的6种情况的每一种,都要有一种相应的对策。需要相应的对策的原因是,用x86指令并不能直接达成某些功能,比如说mov就不能将某个内存地址中的数据移到某个内存地址,而是需要先从内存移到寄存器,再从寄存器移到内存。那个寄存器就是一个「临时寄存器」。
也就是说,对于同一个cy86指令,在产生x86 code的时候都需要分清楚这6种情况(来的时候三种、去的时候三种)。但那样就会有很多重复的code了。如果都经由临时寄存器,就可以避免在每一个非终结符的处理函数中都分六种情况,就可以简化code generator的撰写过程。
那不妨就将「经由临时寄存器」的手法套用到全部六种情况之上。这样的话,这六种情况会变成以下这样,虽然变麻烦了一点,但是却比较简单容易理解:

到哪里去\从哪里来 寄存器 内存地址 立即数
寄存器 先把源寄存器的数据存入临时寄存器
然后把临时寄存器存入目标寄存器。

先把源内存地址中的数据存入临时寄存器
然后把临时寄存器存入目标寄存器。

先把立即数存入临时寄存器
然后把临时寄存器的值存入目标寄存器。
内存地址 先把源寄存器的数据存入临时寄存器

然后把临时寄存器存入目标内存地址。

先把源内存地址中的数据存入临时寄存器

然后把临时寄存器存入目标内存地址。

先把立即数存入临时寄存器
然后把临时寄存器存入目标内存地址。

在以上的每个格子中,蓝色的部分是用于「读入源操作数」的。而红色的部分是用于「写入目标操作数」的。所谓「操作数」,在parser里面看到的其实是一个ASTNode*=语法树结点。

因为每个蓝色的情况都是读入操作数,所以这些就可以都用一套类似的函数来表示。
同样因为每个红色的情况都是将临时寄存器写入操作数,所以这些也都可以用一套类似的函数来表示。如下所示。

把源操作数的数据读入临时寄存器
(包括了所有蓝色的情况)
codeOperandToRegister(const char*, ASTNode*)

把临时寄存器写入目标操作数
(包括了所有红色的情况)

codeRegisterToOperand(ASTNode*, const char*)

以上的const char*表示临时寄存器的名字。
只要有了这些,就可以不用在每个非终结符中都处理六种情况了,而只要通通写codeOperandToRegister,然后对register进行相应的运算,然后再codeRegisterToOperand就可以了。
接下来的问题就是针对六种情况来撰写所需的x86 code。所以第一件事情就是需要知道,一共有哪些x86寄存器被用到了。

【为上文所述的结构来撰写所需的x86 code】

临时寄存器即用来存输入参数、也用来存输出参数。
所用过的临时寄存器有以下这些:

用途 所用过的临时寄存器 说明和例子
充当移动数据时的临时寄存器 ・rax, eax, ax, al ・move8 x8, y8
充当二元运算符的临时寄存器 ・rax, rbx, eax, ebx, ax, bx, al, bl
・rdx, ebx, dx, dl
・isub8 x8, y8, z8
・umod8 x8, y8, z8
充当一元运算符的临时寄存器 ・rsi, esi, si, sil
・al, rbx
・rax, eax, ax, al, cl
・rax, rbx
・s8convf80 f80, s8
・jumpif br8, ar64
・lshift16 x16
 
充当syscall的参数 ・r8, r9, r10, r12, rdi, rsi, rax, rdx ・syscall6

对于这些寄存器,根据情况1~6,可知、需要撰写以下的x86 指令。

  • 需要撰写把一个寄存器的值赋给另一寄存器的x86的mov指令。(寄存器–>寄存器)
  • 需要撰写将以上的寄存器中的值存入内存的x86的mov指令。(寄存器–>内存)
  • 需要撰写将内存中的值存入以上的寄存器的x86的mov指令。(内存–>寄存器)
  • 需要撰写将立即数存入内存的mov指令。(立即数–>寄存器)

一些其它的组合,比如立即数–>内存和内存–>内存,就是由两个操作头尾相接来完成的:立即数–>内存 就是 立即数–>寄存器–>内存;内存–>内存 就是 内存–>寄存器–>内存。
于是在实作中,就有了以下函数。

立即数–>寄存器 void emitMovabxInst(const char* destreg, unsigned char* data, unsigned datasize, unsigned reg_width);
寄存器–>寄存器 void emitMovRegReg(const char*, const char*);
寄存器–>内存

这要由两步来完成,第一步是将内存地址存入rdi中,然后再将寄存器存入[rdi]中。
第一步:codeMemoryAddressToRDI(ASTNode* mem);
第二步:void emitMovToXXXXPtrRDI(const char*);
其中XXXX可以为Byte、Word、Dword、Qword,依数据长短而定。

内存–>寄存器

同样是由两步来完成,第一步是将内存地址存入rdi中,然后再将寄存器存入[rdi]中。
第一步:codeMemoryAddressToRDI(ASTNode* mem);

第二步:void emitMovFromXXXXPtrRDI(const char*);
其中XXXX可以为Byte、Word、Dword、Qword,依数据长短而定。

这样,就完成了数据搬移的部分,撰写code generator的思路也清晰了很多,可以看见Hello World了。

发表在 Uncategorized | 留下评论

“Grouping” and “Connectivity on a Jigsaw Puzzle”

Over the weekend I attempted to express connectivity on a half-completed Jigsaw puzzle using groups, which as far as I understand, are “collections of elements with the same group_id”.

If we're only allowing pieces to be glued together and only checking legality of connectivy when adding new pieces, the legality would be preserved until all pieces are glued together, regardless of which approach (group or connectivity) is used.

However when deletion is involved, there would be an extra layer of complexity. Example:

+---+-----+    +---+    
| A | B   |    | A |      +-----+
+---+-----+    +---+      |  B  |
| C |                 +---+-----+
+---+                 | C |
                      +---+

CASE 1          CASE 2

Case 1 contains a blob that's a valid connection and a valid “group”. The connection is B–A–C, with 2 edges between 3 nodes.

However after A is removed to become Case 2, the connection between A, B and C are completely destroyed. Both B and C are adjacent to A and removing A removes both edges. A, B and C are no longer connected but B and C can still be a group since a group doesn't have to be connected.

This may lead to a bit of confusion, so I came up with an ugly workaround: When taking an element off a group, the entire group is dismantled. This may work well when the group size is small enough and the user doesn't have to take the effort to piece the huge group together again, but if we need large groups… Maybe connectivity graph is the way to go.

发表在 Uncategorized | 留下评论

Makeshift solution for hashing a list

Python says “A list is not hash-able”:

Traceback (most recent call last):
  File “listexperiments.py”, line 135, in
    ParseDRCExperimentSpaceSpecs();
  File “listexperiments.py”, line 82, in ParseDRCExperimentSpaceSpecs
    g_drc_expspace[key] = 0;
TypeError: unhashable type: 'list'

There must be some deep reason under the unhashability of lists, but now, I need to hash a Java ArrayList, and the purpose is to determine whether two ArrayLists contain the same elements without actually having to walk through the list.

A duct-tape solution seems as trivial as the following:

  • Step 1: Prepare a function that takes into two integers and returns one, for example, XOR. The result of the XOR stores the information coming from its two inputs (it also may lose some information by mapping an 2^64-sized space to a 2^32-sized hash space).
  • Step 2: Keep feeding the Kth element and the number K into the input 1 of the function and its result into the input 2 of the function.

​This approach seems to has measures on a sheet music pretty well.
So going back to the original question: Why does not Python allow hashing a List?
Need read StackOverflow.

——————————————————————————————————————————

I read StackOverflow; it turned out the Hash implementation depends on the `id' of an object. The `id' looked much like the `store' or the `memory address' of a `constant', much like the `constant string literal' in CPPGM PA8 that points to variable with a immutable type. The unhashability is based on the rule: “the id (location) of two mutable objects is irrelevant to whether their contents are identical, so a hash function based on the address of the store of an object can't work for mutable types”.

Mutable types in Python include dict, list and bytearray. They are easier to understand.
Immutable types in Python include tuple and str.

Immutables do not change; although the following statements seem to modify the storage of variables, they actually assign new instances to respective variable names:

a=”blah”; id(a); a=”bleh”; id(a);
139997263880816
139997263880864   // will have different hashes
b=”blah”; id(b);
139997263880816  // hash of b would be the same as hash of old a (“blah”)

It looks very much like taking the address of a constant string in C, which in turn is a const char* pointing to the read only data segment.

So far the explanation seems to be enough for explaining the cases I came across during my daily usage of Python.

发表在 Uncategorized | 留下评论