The other day I mentioned a little about Q-Learnings reinforcement learning program.
I am running the program for 24 hours, after finding a way to reduce the 160 billion "states" to 84 million states and furthermore, reducing it to 65535 for testing now,
-----
When I examined the state of the object under learning in the simulation, it was only
482
In other words,
it had reached only 0.735%.
-----
This was a great shock to me.
Currently, my object in the program is able to complete one trial in less than 2 seconds, so 40,000 trials have been completed.
As the object had about 12,000 state mutations in one trial, it should have repeated over 500 million state changes so far.
My object should have been trying out a number of states more than the small number "65535".
If you know Q learning, you might think, "It will be the result of learning convergence," however, In this case, I set the random number selectivity to 90%, so the "convergence" was not a reason.
----
By the way,
"There are 100 ways to live freely ... (*)"
(*) "Feel the wind" / Shogo Hamada
I was a junior high school student when the song was being played in a cup noodle commercial.
Even then, I said
"Don't say stupid!"
"Try to write just ten ways down!"
as a trickster, in a slightly different direction against others.
-----
In this reinforcement learning program with Q-Learning, what I found again is
(1)Hamada Shogo-san talks about the state space of life ("65535" street said here)
however,
He completely ignores the notion of positional constraints or temporal constraints in state transitions
This time, my object I created was allowed 500 million free actions without restrictions.
However, it did not reach the state of only 500.
----
From this viewpoint, Hamada Shogo-san should have sung
"There is no way to live freely ..."
and, it was the absolute correct message for teenagers.
As a matter of fact, I think the results of this program are in line with the real-world situation.