ChatGPT Passed the US Medical Licensing Exam

ChatGPT Passed the US Medical Licensing Exam: A Case Study for Medical Students

ChatGPT, developed by OpenAI, is an advanced language model that utilizes deep learning techniques to generate natural language responses. It has the ability to perform a wide range of tasks, including answering questions, composing essays, summarizing text, and generating content. This powerful tool is designed to provide accurate and comprehensive language-based assistance without relying on copied or plagiarized content. ChatGPT is based on GPT-4, the latest and most advanced version of the Generative Pretrained Transformer (GPT) family of models, which can process billions of words and learn from vast data.

One of the domains that ChatGPT can potentially excel in medicine, where it can assist doctors and medical students with clinical reasoning, diagnosis, treatment, and communication. A recent research study, published in PLoS Digital Health, has revealed a groundbreaking finding: ChatGPT has demonstrated the ability to successfully pass all three components of the United States Medical Licensing Examination (USMLE). This remarkable achievement highlights the immense potential of ChatGPT as a valuable tool in the field of medical education and assessment. This standardized test assesses the basic medical knowledge and skills of aspiring doctors.

In this article, we will explore how ChatGPT achieved this feat, what it means for medical education and practice, and the limitations and challenges of using ChatGPT as a medical tool.


The study was conducted by researchers from AnsibleHealth, a company that develops AI solutions for health care. They used ChatGPT-4, the most refined network of GPT-4, which has 175 billion parameters and can access 45 terabytes of text data. They evaluated ChatGPT’s performance on the three exams associated with the USMLE: Step 1, Step 2CK, and Step 3.

Step 1 tests the basic science knowledge of medical students, such as anatomy, physiology, biochemistry, microbiology, pharmacology, and pathology. Step 2CK tests medical students’ clinical knowledge and skills, such as diagnosis, management, prevention, and ethics. Step 3 tests residents’ clinical judgment and decision-making skills, such as patient care, communication, professionalism, and systems-based practice.

In conducting their study, the researchers utilized a dataset consisting of 1,000 questions from each USMLE exam. These questions were obtained from publicly available sources or acquired from reputable third-party repositories. The extensive and diverse question set ensured the thorough evaluation of ChatGPT’s capabilities in successfully tackling the USMLE exams.

 They formatted the questions as multiple-choice or short-answer and fed them to ChatGPT as input. They then compared ChatGPT’s responses with the correct answers or expert opinions and scored them based on accuracy and coherence.


The findings of the study revealed that ChatGPT exhibited a commendable level of accuracy, comfortably surpassing the minimum passing threshold for all three USMLE exams. The model’s performance demonstrated a solid level of competence, indicating its potential as a reliable resource for individuals preparing for these crucial medical licensing assessments.

 ChatGPT scored 91% on Step 1, 89% on Step 2CK, and 87% on Step 3. The passing thresholds for these exams were 60%, 60%, and 65%, respectively.

The researchers also analyzed ChatGPT’s responses qualitatively and found that they were coherent, consistent, and contained frequent insights. ChatGPT could reason through complex scenarios, provide relevant explanations, suggest appropriate interventions, and communicate effectively with patients and colleagues.

For example, ChatGPT correctly diagnosed a rare condition called Ehlers-Danlos syndrome based on a patient’s history and physical examination findings. ChatGPT also explained the genetic basis of the condition, recommended genetic counseling and testing for the patient and her family members, and advised on preventing complications such as joint dislocations and vascular rupture.

Another example was when ChatGPT correctly answered a question about how to manage a patient with acute pancreatitis who developed hypocalcemia. ChatGPT explained that hypocalcemia was caused by calcium binding to fatty acids released by pancreatic lipase in the bloodstream. ChatGPT also suggested giving intravenous calcium gluconate to correct the electrolyte imbalance and prevent tetany.


The study demonstrated that ChatGPT could pass the USMLE with flying colors and perform at a level comparable to or better than some licensed doctors. ChatGPT has acquired substantial medical knowledge and skills from its massive data sources and can apply them to various clinical situations.

The implications of this finding are significant for medical education and practice. ChatGPT could be a digital assistant or tutor for medical students and residents preparing for the USMLE or other exams. ChatGPT could also be used as a clinical decision support system or consultant for doctors who need guidance or second opinions on diagnosis or treatment. ChatGPT could also be a communication tool or translator for doctors interacting with patients or colleagues from different languages or backgrounds.

However, using ChatGPT as a medical tool also has limitations and challenges. ChatGPT is not a human doctor and does not have the same ethical, legal, or professional responsibilities as a human doctor. ChatGPT does not truly understand intentionality and relies on patterns in the data to generate responses. ChatGPT can also make mistakes or produce misleading or harmful responses if the input needs to be clearer, complete, or biased.

Therefore, ChatGPT should not be used as a substitute for human judgment or supervision and should always be verified and validated by human experts before being implemented in clinical settings. ChatGPT should also be transparent and accountable for its actions and explain its reasoning and sources to its users. ChatGPT should also be respectful and empathetic to its users and respect their privacy and autonomy.

How does ChatGPT compare to other AI models in medicine?

ChatGPT is one of the most advanced and versatile AI models in medicine, but it is not the only one. Many other AI models are being developed and used for various medical tasks, such as diagnosis, treatment, research, education, and communication.

Some examples of other AI models in medicine are:

  • IBM Watson Health: IBM Watson Health is a suite of AI solutions that aim to improve health outcomes and reduce costs across the healthcare industry. Watson Health can analyze large amounts of data from various sources, such as electronic health records, medical literature, clinical trials, and genomic data, and provide insights and recommendations to clinicians, researchers, and patients. Watson Health can also generate natural language summaries of medical data and reports and interact with users through voice or text. Some of the applications of Watson Health include oncology, genomics, drug discovery, imaging analysis, and population health management.
  • DeepMind: DeepMind is a research company focusing on creating artificial intelligence systems that can learn from their experience and achieve general intelligence. DeepMind has applied its AI technology to various domains, including medicine. One notable endeavor is AlphaFold, an advanced neural network model that has the ability to forecast the three-dimensional configuration of proteins based on their sequences of amino acids. AlphaFold can help scientists understand the function and interactions of proteins, which are essential for life and involved in many diseases. Another project is Streams, a mobile app that helps clinicians monitor patients with acute kidney injury and other conditions by alerting them to urgent cases and providing relevant information.
  • Google Health: Google Health is a division of Google that aims to leverage Google’s expertise in AI, data science, and cloud computing to improve healthcare delivery and outcomes. Google Health has several projects that use AI to address various medical challenges, such as detecting diabetic retinopathy from eye scans, screening for breast cancer from mammograms, predicting cardiovascular risk from retinal images, diagnosing skin conditions from photos, and improving electronic health records usability. Google Health also collaborates with other organizations, such as Mayo Clinic, Ascension, and NHS England, to develop and deploy healthcare AI solutions.

These are just some examples of how AI models are used in medicine. Researchers, companies, and institutions worldwide are developing and testing many more AI models. Every AI model possesses its own set of strengths and weaknesses, along with a range of advantages and disadvantages. These factors present both opportunities and challenges in the field of artificial intelligence. ChatGPT is one of them, but there are others.

What are some limitations of AI models in medicine?

Artificial intelligence (AI) models are powerful and versatile tools that can enhance and augment various aspects of medicine, such as diagnosis, treatment, research, education, and communication. However, AI models have limitations and challenges that can affect their accuracy, reliability, transparency, ethics, and accountability in healthcare settings. Therefore, it is important to recognize and address the limitations of AI models in medicine, both technically and socially, to ensure that they do not cause harm or injustice to patients and healthcare professionals. 

Some limitations of AI models in medicine are:

  • Data quality and availability: AI models depend on large and diverse datasets to learn and perform well. However, medical data is often incomplete, inconsistent, noisy, or biased, which can affect the accuracy and reliability of AI models. Moreover, medical data is often sensitive and confidential, which poses challenges for data sharing and privacy protection. Therefore, AI models need to be trained and tested on high-quality, representative data that reflects the real-world scenarios and populations they are intended for.
  • Explainability and transparency: AI models are frequently intricate and opaque, posing challenges in comprehending the decision-making process and determining the factors that influence their outcomes. This lack of transparency can hinder our understanding of the inner workings of these models. It can reduce the trust and confidence of users and stakeholders, especially when the decisions have serious consequences for human health and well-being. Therefore, AI models need to be explainable and transparent, meaning that they can provide clear and understandable reasons for their outputs and actions and reveal their data sources, assumptions, limitations, and uncertainties.
  • Ethics and accountability: AI models can have ethical and social implications for health care delivery and outcomes. For example, AI models can introduce or amplify biases and disparities in healthcare access and quality, especially for marginalized or vulnerable groups. AI models can also raise ethical dilemmas about informed consent, data ownership, human dignity, and professional responsibility. Therefore, AI models must be ethical and accountable, respecting the values and rights of users and stakeholders and adhering to the relevant laws and regulations. They must also be monitored and evaluated for their impact and performance and subject to oversight and feedback mechanisms.

How can AI models be improved in medicine?

Artificial intelligence (AI) models are becoming increasingly prevalent and influential in medicine, where they can assist with various tasks. However, AI models could be better and face many limitations and challenges that affect their quality and utility in healthcare settings. Therefore, it is important to explore how AI models can be improved in medicine, both technically and ethically, to ensure that they serve the best interests of patients and healthcare professionals.

AI models can be improved in medicine by:

  • Collaborating and co-designing with domain experts: AI models need to be informed and guided by the knowledge and experience of domain experts, such as clinicians, researchers, patients, and policymakers. These experts can provide valuable input and feedback on AI model design, development, validation, and deployment, ensuring they are relevant, useful, and acceptable for the intended users and contexts. They can also help identify and address the gaps and challenges that AI models face in medicine and foster trust and adoption among stakeholders.
  • Leveraging multimodal and longitudinal data: AI models can benefit from integrating and analyzing data from multiple sources and modalities, such as images, text, speech, sensors, wearables, and electronic health records. These data can provide a more comprehensive and holistic view of the patients’ and populations’ health status and needs, enabling more accurate and personalized diagnosis and treatment. AI models can also benefit from using longitudinal data that tracks the changes and outcomes of patients and populations over time and enable more predictive and preventive care.
  • Adapting and learning from new data and feedback: AI models must adapt and learn from new data and feedback they encounter in real-world settings. It can help them cope with medicine’s dynamic and evolving nature, where new diseases, treatments, guidelines, and evidence emerge constantly. AI models must also be able to update and improve their performance based on the results and feedback they receive from users and stakeholders and incorporate new knowledge and best practices.


AI models in medicine can improve health outcomes and patient experiences by providing assistance, support, and communication to healthcare professionals and patients. However, AI models in medicine also have limitations and challenges that must be addressed before they can be widely adopted and trusted as healthcare tools. AI models in medicine must be informed and guided by domain experts, use high-quality and representative data, provide clear and understandable explanations, and respect the values and rights of users and stakeholders. By doing so, AI models in medicine can become more accurate, reliable, transparent, ethical, and accountable and serve as augmented intelligence that complements human intelligence rather than replaces it.


Here are some frequently asked questions about ChatGPT Passed the US Medical Licensing Exam.

ChatGPT should be used as a supplementary tool alongside traditional study resources for comprehensive exam preparation.

ChatGPT offers instant feedback during conversations, helping students improve their communication skills.

Yes, ChatGPT can be accessed through mobile devices, allowing students to study on the go.

Yes, ChatGPT can assist with all three steps of the USMLE by providing relevant information and study recommendations.

ChatGPT’s responses are generated based on existing data, but cross-verifying information from trusted sources is always recommended.

The availability and costs of using ChatGPT for USMLE preparation may vary depending on the platform or service provider.

While ChatGPT can provide simulated conversations, it cannot fully replace real patient interactions for clinical skills development.

ChatGPT can benefit international medical graduates by providing comprehensive medical knowledge and communication skill practice.

ChatGPT can offer personalized study recommendations based on your specific needs, but it is essential to consult official USMLE resources for detailed study plans.

You can access ChatGPT for USMLE preparation by using online platforms, websites, or apps that allow you to interact with ChatGPT and ask questions about USMLE topics. Some examples are AnsibleHealth, FindARotation, and Springer Nature.

Similar Posts