Monday, 12 September 2016

Getting AI smarter with Q-learning: a simple first step in Python

Yesterday I found an “old” script I wrote during a morning in the last semester. I remember being a little bored and interested in the concept of Q-learning. That was about the time Alpha-Go had beaten the world champion of Go and by reading here and there I found out that a bit of Q-learning mixed with deep learning might have been involved.

qlearning[3]

Indeed Q-learning seems an interesting concept, perhaps even more fascinating than traditional supervised machine learning since in this case the machine is basically learning from scratch how to perform a certain task in oder to optimize future rewards.

If you would like to read a, quote, “Painless Q-learning tutorial”, I suggest you to read the following explanation: A Painless Q-learning Tutorial. In this article the concept of Q-learning is explained through a simple example and a clear walk-through. After having read the article I decided to put into code the example shown. The example shows a maze through which the agent should go and find its way up to the goal stage (stage 5). Basically, the idea is to train an algorithm to find the optimal path, given an (often random) initial condition, in order to maximize a certain outcome. In this simple example, as you can see from the picture shown in the article, the possible choices are all known and the outcome of each choice is deterministic. A best path exists and can be found easily regardless of the initial condition. Furthemore, the maze is time invariant. These nice theoretical hypothesis are usually not true when dealing when real world problems and this makes using Q-learning hard in practice, even though the concept behind it is relatively simple.

Before taking a look at the code, I suggest to read the article mentioned above, where you will get familiar with the problem tackled below. I’m not going deep in explaining what is going on since the author of that article has already done a pretty good job doing it and my work is just a (probably horribly inefficient) translation in Python of the algorithm.

Given an initial condition of, say, state 2,  the optimal sequence path is clearly 2 - 3 – 1 - 5. Let’s see if the algorithm finds it!

And sure enough there it is! Bear in mind this is a very simplified problem. At the same time though, keep in mind that Alpha Go is powered in part by a similar concept. I wonder what other application will come from Q-learning.

The concept of Q-learning is more vast than what I showed here, nevertheless I hope my post was an interesting insight.

140 comments:

  1. Hey, I am getting error in Q learning formula.The error is " too many indices for array".How to solve that?

    ReplyDelete
    Replies
    1. Hi, there seems to be an error in the indexing process, try to start debugging there.

      Delete
    2. The Beginner Programmer: Getting Ai Smarter With Q-Learning: A Simple First Step In Python >>>>> Download Now

      >>>>> Download Full

      The Beginner Programmer: Getting Ai Smarter With Q-Learning: A Simple First Step In Python >>>>> Download LINK

      >>>>> Download Now

      The Beginner Programmer: Getting Ai Smarter With Q-Learning: A Simple First Step In Python >>>>> Download Full

      >>>>> Download LINK OM

      Delete
    3. Q-learning is a model-free reinforcement learning Deep Learning Projects for Final Year algorithm used to find the optimal action-selection policy for a given finite Markov Decision Process (MDP). It is designed to learn the value of taking a particular action in a particular state and ultimately to derive an optimal policy that maximizes cumulative reward over time.

      Key Concepts of Q-Learning
      Reinforcement Learning (RL):
      python projects for engineering students

      Delete
  2. In my example I have 50000 states .... and I got memory error.

    ReplyDelete
  3. Thanks for the nice python implementation - works great on new graphs. One minor nitpick, your graph image doesn't correspond to your reward matrix, it shouldn't go from 3 to 5 and back, instead 3 only goes to points 4 and 1. 5 loops onto itself for 100 points. Thanks for posting this!

    Manuel

    ReplyDelete
    Replies
    1. Hi, thanks for reading! Oh, I didn't notice that! I didn't want to take the picture from the original article so I made one myself in a hurry. Thanks for pointing out the mistake, fixed it! :)

      Delete
  4. Hi Mic, can you tell me why the next action is chosen at random (lines 29-33). I thought in Q learning the next action is chosen based on the highest Q value. Best, Nina

    ReplyDelete
    Replies
    1. Hi Nina, the next action is chosen at random only during the training of the agent in order to explore the environment (and build the Q matrix). In testing (lines 71->89) the next action is chosen according to the highest Q value as you expect.

      Delete
  5. In the update function, why are we selecting the Q max index based on the randomly selected action going into what I thought was the state field of Q? Such that Q(state, action) returns the Q-value for that specific pair, why are we inserting action into state field? (Q[action,])? Shouldn't we want to insert state there to get index from that?

    ReplyDelete
    Replies
    1. Q-learning is a reinforcement learning algorithm used for learning optimal policies in Markov decision processes (MDPs). Deep Learning Projects for Final Year Students It's a model-free algorithm, meaning it doesn't require knowledge of the transition probabilities and rewards of the environment, but instead learns from interactions by exploring actions and observing rewards.

      Delete
  6. It's a fantastic example for designing MDP based algorithms for many randomly generated environments.
    May I ask a question? Do you have like this example for multi agents. like team-q (friend-q) learning for cooperative mission

    ReplyDelete
    Replies
    1. Thanks! I'm sorry, as of now I haven't got any examples for multi agents.

      Delete
    2. Bonjour,
      comment adapter un algorithme de Q learning aux jeux de shogi ou échecs?

      Delete
  7. Hi ! Great articles combined with the painless Q-learning tutorial ! I'm interested to develop a reinforcement learning algorithm for a flash game, if i'm correct, you don't create any walls or rooms, your agent are just wandering around. So i just need to get data from the flash game and change the python algorithm ?

    ReplyDelete
    Replies
    1. Hi, as long as you can build the R matrix you can adapt this example by simply using your R matrix. However I think it is going to be a bit slow with larger R matrices.

      Delete
  8. Hi,
    Thanks for this great article!. Btw how many layers are in the NN you have implemented here? Can we specify the number of hidden layers in the network in reinforcement learning?

    ReplyDelete
  9. def sample_next_action(available_actions_range):
    ... next_action = int(np.random.choice(available_act,1))
    File "", line 2
    next_action = int(np.random.choice(available_act,1))
    ^
    IndentationError: expected an indented block

    any idea why?

    ReplyDelete
  10. Thanks a lot for sharing marvellous information on sap course. Thanks for sharing this valuable post with us.
    Python Training in Gurgaon

    ReplyDelete
  11. Thank you for sharing such great information very useful to us.
    Python Training institute in Noida

    ReplyDelete
  12. Thankful to you for this amazing information sharing with us. Get website designing and development services by Ogen Infosystem.
    Website Designing Company in Delhi

    ReplyDelete
  13. Your content is really awesome and understandable, thanks for the efforts in this blog. Visit Mutual Fund Wala for Mutual Fund Schemes.
    Mutual Fund Companies

    ReplyDelete
  14. Decent, Get Service for Night out page 3 parties and this magnificent service provided by Lifestyle Magazine.
    Lifestyle Magazine India

    ReplyDelete
  15. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging
    Python Training in electronic city

    ReplyDelete
  16. Thanks! If I understand correctly, the 4th row in th eR matrix should be [0, -1, -1, 0, -1, 100].

    ReplyDelete
    Replies
    1. Yeah I also think so this is one small mistake.

      Delete
  17. Great, I think this is one of the best blog in past some time I have seen. Visit Kalakutir for Fleet Painting, Godown Line Marking Painting and Caution & Indication Signages.
    Fleet Painting

    ReplyDelete
  18. Hiiii....Thanks for sharing Great information....Nice post....Keep move on....
    Python Training in Hyderabad

    ReplyDelete
  19. do you have any example of anomaly detection using q learning

    ReplyDelete
  20. Thanks for sharing such a great information.. It is really helpful to me..I always search to read the quality content and finally i found this in you post. keep it up!
    Our Service:
    Digital marketing Company
    SMM Services
    PPC Ads Services
    PPC Services in Delhi
    Website Design & Development Packages
    Seo Packages India
    Web Development Packages

    ReplyDelete
  21. Since 2010, Lakshay Arora Stock Broker in Delhi 9+ year's Experience , We are deal in Intraday tips provider in Delhi, Stock Market Expert in Delhi, Mutual funds Expert in Delhi, Types of Investment and Demate Account Service in Delhi etc. Contact us for these types of services

    ReplyDelete
  22. your content is really awesome, Thank you for your information. We are providing python online training. Python Online Training || Python Online Course

    ReplyDelete
  23. Once again you provide several doses of reality which explore the complete explanation of Banking App Development Cost. This article don't have to be that long. I simply couldn't leave your web site before suggesting that I actually loved the usual info on IT consulting companies in New York. I just want to know what is the best way to get real estate auction software in the first place.

    Off to share!

    ReplyDelete
  24. Once again you provide several doses of reality which explore the complete explanation of packing and moving companies in Bangalore. This article don't have to be that long. I simply couldn't leave your web site before suggesting that I actually loved the usual info on packing and movers services in Bangalore.

    ReplyDelete
  25. This developmental project is undertaken by RUMRSCO (Pvt.) Limited in collaboration with National Engineering Services Pakistan (NESPAK) – a Pakistani government-owned, globally recognized consultancy firm. The two admired entities of the real estate sector of Pakistan signed this mega collaboration project on 1st January’2020.
    Rudn Enclave payment plan
    park view lahore payment plan
    Blue world city Islamabad payment plan

    ReplyDelete
  26. It is extremely nice to see the greatest details presented in an easy and understanding manner.
    best data science institute in hyderabad



    ReplyDelete
  27. In order to function as a printer, it is very important to download and install a proper driver from 123.hp.com/setup. Type the link 123.hp.com/setup onto the browser & type the correct model number in the search box and download the driver.

    ReplyDelete
  28. Mobile app development is a lucrative and in-demand job path. Enroll in an advanced program in Android app development to determine if you're cut out for a future as a mobile app developer. Suffescom Solutions should also be aware of the App Store Optimization procedure, which is critical if you want to be found by consumers looking for apps that are comparable to yours.

    ReplyDelete
  29. Thanks for sharing a very inspiring post and thought! Suffescom Solutions has a sizable presence in the mobile development industry. And it gets better and better with each new release in terms of development speed and performance. Building a chat application, such as a react slack clone, you can create a messaging app in minutes, thanks to the power of the current technique.

    ReplyDelete
  30. A ready-to-run E-Learning application service for quicker pace on the market, with considerable time and resources set aside.
    https://lilacinfotech.com/what-we-do/e-learning-educational-on-demand-app-development-india

    ReplyDelete
  31. Thank you for providing the best information regarding mobile apps. It is very user-friendly and covers every aspect of food delivery app development.  Since technology is evolving especially in terms of the food business. The rise of mobile app development is also achieving new heights. Here the Suffescom solutions provide all sorts of information regarding mobile app development under one roof.

    ReplyDelete
  32. Visit our website flixtor.is to watch full free movies and and Tv shows online.

    ReplyDelete
  33. Check out our website afdah org and watch online free movies and Tv shows without any membership.

    ReplyDelete
  34. If you are looking for the best platform to stream free movies and tv shows online, you can visit flixtor one website and Watch without any membership.

    ReplyDelete
  35. If you are looking for the best website to watch movies for free online, you can check out afdah tv net website and stream without any membership.

    ReplyDelete
  36. You can watch online free movies and Tv shows only on antmovies tv website in full HD quality without any membership.

    ReplyDelete
  37. Say, you got a nice article post.Really thank you! Really Great. Machine Learning Online Training In Hyderabad

    ReplyDelete
  38. You can visit our website my flixer movies and watch online free movies and Tv shows for free without any membership. Myflixer movies is the #1 choice to stream your favourite movies for free online.

    ReplyDelete
  39. The person at the helm of the dedicated development team is the manager. They are supposed to be responsible for the entire project’s lifecycle and ensure successful development.

    ReplyDelete
  40. IT teams are being held responsible for delivering a seamless user experience to their customers. Although the technology you use is essential, digital business transformation is more about the company culture. It should uphold the values necessary for promoting long-term success.

    ReplyDelete
  41. Digital Marketing agency in London also includes developing a website architecture that is SEO-friendly. It is another technical consideration to be taken into account. Many web developers underestimate the importance of a sitemap when creating a digital platform—the URL structure and its security also affect the SEO ranking.

    ReplyDelete
  42. Really impressed! Everything is very open and very clear clarification of issues. It contains truly facts. Your website is very valuable. Thanks for sharing.
    Best Refrigerator Repair Service in Hyderabad

    ReplyDelete
  43. All your hard work is much appreciated. This content data gives truly quality and unique information. I’m definitely going to look into it. Really very beneficial tips are provided here and, Thank you so much. Keep바카라사이트 up the good works.

    ReplyDelete
  44. Thanks! Very interesting to read. This is really very helpful. Since I started using AI, traffic has really grown! to know more about Digital Marketing. Digital Marketing Course in Noida

    ReplyDelete
  45. Great Information sharing .. I am very happy to read this article .. thanks for giving us go through info. Fantastic nice. I appreciate this post. 야한동영상

    Click this link
    야설

    ReplyDelete
  46. If you are looking to find the Best Recruitment Software then you first have to know how to select the most effective recruiting platforms. The best recruitment software should be in the price budget and should have the best tools and features.
    Please visit: high volume recruitment platform

    ReplyDelete
  47. I like this article, really explained everything in the detail, keep rocking like this. I understood the topic clearly, to learn more join artificial intelligence course

    ReplyDelete
  48. In terms of real estate industry, top agencies provides awesome output to assist customers in their investments. Being one of the top among all, we strives to provide quality guidance and consultancy regarding real estate mangement and solutions. Projects like Capital smart city and Lahore smart city are the theme of the company.
    RUDN Enclave Rawalpindi
    Tower 21 Gulberg Lahore
    NOVA City Islamabad

    ReplyDelete
  49. Hey friend, it is very well written article, thank you for the valuable and useful information you provide in this post. Keep up the good work! FYI, Pet Care adda
    how to activate flipkart axis bank credit card, the millionaire next door
    ,The Price Of Flowers Summary

    ReplyDelete
  50. Great Post. Very informative. Keep Sharing!!

    Apply Now for Machine Learning Training Classes in Noida

    For more details about the course fee, duration, classes, certification, and placement call our expert at 70-70-90-50-90

    ReplyDelete
  51. 360DigiTMG, the top-rated organisation among the most prestigious industries around the world, is an educational destination for those looking to pursue their dreams around the globe. The company is changing careers of many people through constant improvement, 360DigiTMG provides an outstanding learning experience and distinguishes itself from the pack. 360DigiTMG is a prominent global presence by offering world-class training. Its main office is in India and subsidiaries across Malaysia, USA, East Asia, Australia, Uk, Netherlands, and the Middle East.

    ReplyDelete
  52. The Beginner Programmer: Getting Ai Smarter With Q-Learning: A Simple First Step In Python >>>>> Download Now

    >>>>> Download Full

    The Beginner Programmer: Getting Ai Smarter With Q-Learning: A Simple First Step In Python >>>>> Download LINK

    >>>>> Download Now

    The Beginner Programmer: Getting Ai Smarter With Q-Learning: A Simple First Step In Python >>>>> Download Full

    >>>>> Download LINK

    ReplyDelete
  53. Excellent effort to make this blog more wonderful and attractive.
    cyber security coursea

    ReplyDelete
  54. Graphic designing, image processing, gaming are now the common day to day activities, handled smoothly by Python codes.
    bookkeeping service in barking

    ReplyDelete
  55. This is an awesome post. Really very informative and creative contents. Visit my website to get best Information About Best UPSC Coaching Institute in Borivali.
    Best UPSC Coaching Institute in Borivali
    UPSC Coaching Institute in Borivali

    ReplyDelete
  56. This type is exceptional. These sorts of minuscule realities are utilized a wide assortment of confirmation skills. My accomplice and I favor the hypothesis much.
    Luxury mini bus on rent in Delhi

    ReplyDelete
  57. I am very impressed with your post because this post is very beneficial for me and provide a new knowledge to me.
    Best Website Designing Training in Noida
    java training in noida
    python training in noida
    PMP Training in Noida

    ReplyDelete
  58. Car Rental & Taxi Service
    Hire Taxi in Jodhpur, Jaipur & All Over Rajasthan.

    https://driveindiabyyogi.com/

    ReplyDelete
  59. This blog is really helpful for the public .easily understand,
    Thanks for published,hoping to see more high quality article like this.
    온라인카지노

    ReplyDelete
  60. This comment has been removed by the author.

    ReplyDelete
  61. Hi! I really like your content Your post is really informative.

    Best Machine Learning Course in Bangalore

    ReplyDelete
  62. Simply brilliant and informative, I think this would be very handy for us as we are engaging a digital transformation consultant to develop a strategy for us.

    ReplyDelete
  63. Thanks for writing this blog, You may also like the Students Database

    ReplyDelete
  64. Thanks for writing this blog, making AI smarter with q-learning will make easy for everyone to understand Artificial intelligence. Learn AI techniques from the best data science course in Bangalore.

    ReplyDelete
  65. I like your blog. Open the doors of your career with Python Training Course in Greater Noida.

    ReplyDelete
  66. Nice Blog, AI becoming a most powerful technologies. Grow your skills, learn Artificial intelligence in this summer. Join Summer training course in Noida.

    ReplyDelete
  67. Very nice Blog! The way you highlight the benefits and opportunities of Python Training in Noida is truly amazing.

    ReplyDelete
  68. Your post is a game-changer! Thanks for providing such valuable information.

    Learn more about our Artificial intelligence certification to get the best knowledge.

    ReplyDelete
  69. Hey Thanks For Sharing Its a Valuable Content about artificial intelligeence if you want to develop your app kindily contact Smiligence

    ReplyDelete
  70. Achieving a file's constancy across different devices and screen types hinges upon its conversion into the PDF format. This shift safeguards the document's originality, maintains accurate headings, and preserves the assortment of media files. Engaging with an online tool like tone analysis focused on tagging PDF files yields outcomes that exceed initial expectations.

    ReplyDelete
  71. In the dynamic programming world, mastering a versatile language like Python can open up numerous opportunities. If you're looking for the best Python Institute in Noida to acquire comprehensive and industry-relevant knowledge, look no further than APTRON Solutions Noida. With a proven track record of excellence, APTRON Solutions Noida stands out as a premier destination for honing your Python skills.

    ReplyDelete
  72. Blue World City is a rapidly emerging residential project located in Islamabad, Pakistan. Touted as a collaborative venture between Chinese and Pakistani developers, the project aims to bring together modern infrastructure, urban planning, and architectural aesthetics to create a self-sustained and innovative community. With a distinct focus on providing affordable housing options, state-of-the-art amenities, and a blend of Chinese and Pakistani cultural elements, Blue World City envisions itself as a unique addition to the region's real estate landscape. The project's strategic location, just a short drive from Islamabad's main attractions and the new Islamabad International Airport, further enhances its appeal as a convenient and promising living destination.

    ReplyDelete
  73. If this is the first time you have encountered the problem of introducing labeling of data of different directions to introduce these technologies into your company, I recommend that you visit this resource Rapid annotation, where you can find everything you need and even get advice in order to professionally approach the solution of all possible problems.

    ReplyDelete
  74. Blue World City Islamabad is a remarkable real estate development that embodies the epitome of modern living. Nestled in the heart of Pakistan's capital, this ambitious project offers an extraordinary lifestyle experience. With a vision to create a world-class community, Blue World City boasts top-notch amenities, lush green spaces, and state-of-the-art infrastructure. It's not just a city; it's a lifestyle. The Sports Valley Block within this magnificent city stands out as a hub for sports enthusiasts and those seeking an active and healthy lifestyle. Here, you can find top-tier facilities for a wide range of sports, from soccer to tennis and everything in between. If you're looking to invest in the future of Islamabad, Blue World City is the place to be. To explore the exciting opportunities that await you in the Sports Valley Block, visit the official website of Sapphire Properties, and discover your dream home in this thriving community by clicking here. Your gateway to a vibrant and prosperous future starts here!

    ReplyDelete
  75. This blog is a goldmine of knowledge! Thanks for simplifying complex concepts and letting us know about Top Python Training Course!.

    ReplyDelete

  76. UR BHATTI ACADEMYonline courses have been a game changer for me. The deapth of knowledge and practical skills . I'vegained is incredible. Whether it's SEO web development or graphic design, the instructor here are top-notch. I can confidently say that these courses have enhancedmy my career prospects and personal skills. Hightly Recommended.

    ReplyDelete
  77. If you aspire to become a machine learning expert and make a mark in this dynamic field, APTRON Gurgaon is the place to be. Our industry-driven training programs, experienced trainers, and cutting-edge facilities make us the premier Machine Learning Training Institute in Gurgaon . Join us today and embark on a journey towards a successful career in machine learning.

    ReplyDelete
  78. Blue World Shenzhen City Lahore emerges as a beacon of innovation in urban living. Nestled on Canal Road, this visionary Water Lagoon Community is poised to redefine real estate paradigms. With an IT Park at its core, it aspires to be Lahore's future IT hub, blending technology with contemporary living. Positioned strategically near the new Islamabad International Airport and CPEC, Shenzhen City ensures international standards in collaboration with Shah Jehan Municipal Engineering. Boasting proximity to key landmarks, from Bahria Town to Lahore International Airport, it offers residents a perfect blend of convenience and growth potential, underscored by a lavish array of amenities that transcend traditional expectations.

    ReplyDelete
  79. Hi , Fantastic Q-learning demonstration! Your step-by-step Python script and maze example make the concept easily digestible. The randomized actions and handling of multiple max Q values add a practical touch. The output aligning with the expected optimal sequence showcases the script's success. Appreciate the acknowledgment of the simplified problem and its connection to advanced applications like Alpha Go. A great intro to Q-learning in Python! AI in Reatail

    ReplyDelete
  80. Kliff Technologies is a Digital Marketing Company in JanakPuri, Delhi and provides best-in-class services to businesses of all sizes.

    ReplyDelete
  81. We've compiled a list of the Functional Testing services in Noida in this blog comment. You can learn more about them and select the one that best suits your business needs.

    ReplyDelete
  82. This comment has been removed by the author.

    ReplyDelete
  83. Great Q-learning introduction! It raises the question of what an app development firm could accomplish with this. Just think about AI programs creating and refining mobile applications on their own! Machine learning has a bright future; thanks for sharing!

    ReplyDelete
  84. Thank you for precious information, Contact us today to learn about Machine Learning Testing and how it can benefit your team today! some helpful tips as to how users can prevent where machine learning testing comes in, indicating a major shift in the quality assurance field.

    ReplyDelete
  85. https://neuralnetworksmeet.blogspot.com/2022/07/explain-how-deep-learning-works.html?showComment=1707133000445#c4200818162963279020

    ReplyDelete
  86. Nice article and thanks for sharing with us.
    Best SEO Agency

    ReplyDelete
  87. "Explore a Diverse Curriculum in OurData analytics course in South Delhi, Covering Python Basics, Data Types, and More!"

    ReplyDelete
  88. This comment has been removed by the author.

    ReplyDelete
  89. Nice informative content. Thanks for sharing the valuable information.

    Best Machine Learning Training in Bangalore

    ReplyDelete
  90. TotalCloudAI's machine learning course london provides comprehensive, practical training, empowering learners to excel in ML applications and technologies.

    ReplyDelete
  91. Thanks for sharing this informative blog on AI, and its advancements with Machine Learning, AI has grown rapidly. With the new advancements in AI, custom mobile app development services have also been growing.

    ReplyDelete