Getting Started with Keyword Extracting using Google Gemini

3 min readJul 23, 2024

Keyword extraction is important for use cases where exact word match is required, such as tagging, full-text search, and filtering. Traditional keyword extraction involves complex natural language processing (NLP) pipeline, but that all changes with Large Language Model. In this post, we will guide you on how to extract relevant keywords from text using Google Gemini API.

Setting Up Google Gemini API Key

Signup/Signin your Google Gemini account. Then, navigate to the API key page and click “Create API Key”. Once you API key is generated, please save this somewhere safe and do not share it with anyone.

Install Google Gemini python package

!pip install -q -U google-generativeai

Initiate Gemini API client

import google.generativeai as genai

GOOGLE_API_KEY="<YOUR_GEMINI_API_KEY>"
genai.configure(api_key=GOOGLE_API_KEY)

Extract Relevant Keywords

A prompt is an instruction to an LLM to guide LLM on how it should process the input and format response. Prompt engineering is the art of creating the right prompt to get the best output from an LLM.

Let’s say we want to create a job recommender system. We can use Google Gemini to extract category and skill keywords for each job posting and for a user resume.

Prompt Templates:

prompt_template_job = """
Extract industries, job functions and skill keywords from the following job description. Please provide response as a Python dictionary with the following format:
{
  "industry": list of industries,
  "function": list of job functions,
  "skill": list of skills
}

Job description:
<job_description>
"""

prompt_template_resume = """
Extract industries, roles and skill keywords from the following resume. Please provide response as a Python dictionary with the following format:
{
  "industry": list of industries,
  "function": list of job functions,
  "skill": list of skills
}

Resume:
<resume>
"""

Function to extract relevant keywords from a job description or a resume:

import json

def get_keyword(text, model="gemini-1.5-flash", task="job"):
    if task == "job":
      prompt_template = prompt_template_job
      text_to_replace = "<job_description>"
    elif task == "resume":
      prompt_template = prompt_template_resume 
      text_to_replace = "<resume>"
    else:
      raise ValueError("Unsupported task type: ", task)

    text = text.replace("\n", " ")
    prompt = prompt_template.replace(text_to_replace, text)

    model = genai.GenerativeModel(model)
    result = response = model.generate_content(prompt, generation_config={"response_mime_type": "application/json"})
    if result is not None:
      try:
        return json.loads(result.text)
      except:
        print("Error processing JSON response:", result.text)
        return None
    else:
      print("Error extracting keywords")
      return None

def flatten_keywords(keywords_json):
    flattened_keywords = []
    for key in ["industry", "function", "skill"]:
      if key in keywords_json:
          flattened_keywords += keywords_json[key]

    return list(set(flattened_keywords))

Extract Relevant Keywords

Then, we can calculate keyword matching, such as Jaccard similarity score, between job postings’ list of skills and user resume’s list of skills. The job that is most suitable to the user should have the highest matching score.

def calculate_jaccard_similarity(list1, list2):
    set1 = set(list1)
    set2 = set(list2)
    # intersection of two sets
    intersection = len(set1.intersection(set2))
    # Unions of two sets
    union = len(set1.union(set2))
     
    return intersection / union

You can see a complete example in this Google Colab Notebook.

Getting Started with Keyword Extracting using Google Gemini

Written by Analytics Sense

No responses yet