enter reinforcement

This commit is contained in:
2024-07-06 01:30:00 +08:00
parent f105ba0150
commit e3f8181056
116 changed files with 19698 additions and 0 deletions

View File

@ -0,0 +1,132 @@
Values at iteration 0 are correct.
Student/correct solution:
values_k_0: """
0.0000
0.0000
0.0000
"""
Q-Values at iteration 0 for action south are correct.
Student/correct solution:
q_values_k_0_action_south: """
illegal
0.0000
illegal
"""
Q-Values at iteration 0 for action west are correct.
Student/correct solution:
q_values_k_0_action_west: """
illegal
0.0000
illegal
"""
Q-Values at iteration 0 for action exit are correct.
Student/correct solution:
q_values_k_0_action_exit: """
-10.0000
illegal
10.0000
"""
Q-Values at iteration 0 for action east are correct.
Student/correct solution:
q_values_k_0_action_east: """
illegal
0.0000
illegal
"""
Q-Values at iteration 0 for action north are correct.
Student/correct solution:
q_values_k_0_action_north: """
illegal
0.0000
illegal
"""
Values at iteration 1 are NOT correct.
Student solution:
values_k_1: """
0.0000
0.0000
0.0000
"""
Correct solution:
values_k_1: """
-10.0000
0.0000
10.0000
"""
Q-Values at iteration 1 for action south are NOT correct.
Student solution:
q_values_k_1_action_south: """
illegal
0.0000
illegal
"""
Correct solution:
q_values_k_1_action_south: """
illegal
5.6250
illegal
"""
Q-Values at iteration 1 for action west are correct.
Student/correct solution:
q_values_k_1_action_west: """
illegal
0.0000
illegal
"""
Q-Values at iteration 1 for action exit are correct.
Student/correct solution:
q_values_k_1_action_exit: """
-10.0000
illegal
10.0000
"""
Q-Values at iteration 1 for action east are correct.
Student/correct solution:
q_values_k_1_action_east: """
illegal
0.0000
illegal
"""
Q-Values at iteration 1 for action north are NOT correct.
Student solution:
q_values_k_1_action_north: """
illegal
0.0000
illegal
"""
Correct solution:
q_values_k_1_action_north: """
illegal
-5.6250
illegal
"""