openai 0.27.4 -> 0.28.1 (to current)
- all requirements satisfied
this should be non-breaking for anyone, the old version is obsolete
---
then
-> tiktoken 0.5.1 (pip install tiktoken)
- will need regex>=2022.1.18
tiktoken installation will download 3MB caches upon first use, run:
>>> import tiktoken
>>> encoding = tiktoken.get_encoding('p50k_base')
>>> encoding = tiktoken.get_encoding('r50k_base')
>>> encoding = tiktoken.get_encoding('cl100k_base')
>>> encoding.encode("hello")
[15339]
hopefully the system cache can be made readable from user space, using os in the library and os.environ.
self-documentation, from tiktoken's load.py:
def read_file_cached(blobpath: str) -> bytes:
if "TIKTOKEN_CACHE_DIR" in os.environ:
cache_dir = os.environ["TIKTOKEN_CACHE_DIR"]
elif "DATA_GYM_CACHE_DIR" in os.environ:
cache_dir = os.environ["DATA_GYM_CACHE_DIR"]
else:
cache_dir = os.path.join(tempfile.gettempdir(), "data-gym-cache")
if cache_dir == "":
# disable caching
return read_file(blobpath)
cache_key = hashlib.sha1(blobpath.encode()).hexdigest()
cache_path = os.path.join(cache_dir, cache_key)
if os.path.exists(cache_path):
with open(cache_path, "rb") as f:
return f.read()
contents = read_file(blobpath)
os.makedirs(cache_dir, exist_ok=True)
tmp_filename = cache_path + "." + str(uuid.uuid4()) + ".tmp"
with open(tmp_filename, "wb") as f:
f.write(contents)
os.rename(tmp_filename, cache_path)
return contents
Otherwise the system installed code would download for each object instantiation or bork if no `os` is available. unmodified user code would also seem to need os.