Bakalarska_praca/private_gpt/index_JSON.py
oleh 959a391334
Some checks failed
publish docs / publish-docs (push) Has been cancelled
release-please / release-please (push) Has been cancelled
tests / setup (push) Has been cancelled
tests / ${{ matrix.quality-command }} (black) (push) Has been cancelled
tests / ${{ matrix.quality-command }} (mypy) (push) Has been cancelled
tests / ${{ matrix.quality-command }} (ruff) (push) Has been cancelled
tests / test (push) Has been cancelled
tests / all_checks_passed (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
add self code
2024-09-27 18:52:16 +02:00

21 lines
676 B
Python

import json
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200, 'scheme': 'http'}])
def load_drug_data(json_path):
with open(json_path, 'r', encoding='utf-8') as f:
data = json.load(f)
return data
def index_documents(data):
for i, item in enumerate(data):
doc = f"{item['link']} {item.get('pribalovy_letak', '')} {item.get('spc', '')}"
es.index(index='drug_docs', id=i, body={'text': doc, 'full_data': item})
data_path = "../data/cleaned_general_info_additional.json"
drug_data = load_drug_data(data_path)
index_documents(drug_data)
print("Индексирование завершено.")