Generative AI and Information Privacy
-
Autor:
Miquel Lara and Juan Ramón González
-
Fecha:
12 April, 2024
-
Categoría
- Data science
Information Privacy
All current advances in the field of Generative AI share the same challenge that often makes their use difficult: the privacy of the information provided to these systems. There have been cases where data has been leaked due to the use of these technologies, such as the leakage of sensitive code from important companies after several employees used it in ChatGPT , and other occasions, where private information from other users has been extracted.
One of the main concerns when we discuss the strategy to deploy solutions with Generative AI with our clients is the control of their data. We always focus on preventing leaks or their data having to be sent to external services.
Today, all major Cloud providers such as AWS , Azure and GCP have their own LLM services. Therefore, one of the core functionalities that have been implemented within our framework is the ability to deploy Generative AI systems in any cloud provider, as well as in on-premise environments without having to modify our code. This way, we can focus on the work of creating the application, with the peace of mind of knowing that the data is safe in the corporate work environment.
In our Gen AI Framework we have integrated two core elements:
- Data storage: from OpenSource VectorStore such as FAISS, to as-a-service databases from different cloud providers such as Azure Search or OpenSearch, to have a native cloud storage environment.
- LLM: integration with the models available in each cloud provider and with on-premise deployments so that we have the creation of embeddings, such as calls to the LLMs, in the client’s secure environment.
Data Storage
One of the main techniques used by the Gen AI Framework is Retrieval Augmented Generation. This technique uses additional sources of data that can be used to increase the knowledge of the AI, without having to adapt or train the model previously. For this, specialized databases are used. So-called vector databases, which allow generative models to search for relevant texts and use them to answer queries. Since these databases may contain sensitive data, we have chosen to use both open source technologies and cloud-native technologies available from all cloud providers.
Open Source
On the open source side, there is native integration with FAISS ( Facebook AI Similarity Search ), a library developed by Meta, which allows search by semantic similarity of the contents entered. This can be deployed privately on your own infrastructure, and, due to its easy access, is very useful for rapid prototyping without having to set up additional cloud services. The framework can deploy the database and manage its contents efficiently.
Cloud
In the case of cloud technologies, for now Gen AI Framework has Azure Cognitive Search and Amazon OpenSearch. These are cloud services that work like any other cloud database, and that have the same security, access control and privacy guarantees than the other services available on AWS and Azure. Here you can connect to the services once configured, manage the content and use them in user queries.
Large Lenguage Models (LLM)
The models used to generate the responses must also be taken into account when maintaining data privacy. Here you can deploy both your own infrastructure and the cloud, but at the Mática Group we have focused on cloud deployment.
Deploying models requires powerful hardware, with high memory, CPU, and even GPU requirements, which is more complicated to manage, especially in multi-user environments. Luckily, cloud providers offer us a large number of models, both proprietary and open source, that can be deployed privately, and that we can use in the framework as agents.
If we focus on the different cloud providers, each one offers the following:
- Azure
It features proprietary OpenAI models, packaged in the Azure OpenAI service. These models have high performance, since they are the same ones used by ChatGPT, but they have the guarantee that the data entered will not be used at any time to train the models.
They have a token payment model, similar to ChatGPT, and being based on the Azure cloud , access to the models can be limited through the use of private endpoints and virtual networks.
- Amazon Web Services
Amazon’s platform for Generative AI is Amazon Bedrock , which is a managed service for Generative AIs. Unlike Azure, which uses OpenAI’s GPT models, Amazon has both its own models (Amazon Titan) and a family of open source models, such as Llama 2, Claude and Mistral. These models are deployed privately, and have a pay-per-use API.
- Use within the Gen AI Framework
All these types of models and databases are usable within the framework interchangeably, according to the needs of the project. This provides maximum flexibility, as you are not limited to only using solutions from one provider or another, and allows quick testing for each use case, being able to test the same queries, but on different environments.
Thanks to deployments in private environments, in the Matica Group We guarantee the security of user data, and we make it possible for new clients to benefit from the great potential of Generative AIs.
CONTACT
DO YOU NEED HELP?
In Mática we want and we can help you to improve your decision making process, thanks to the transformation and the interpretation of your business data using Big Data technologies and Artificial Intelligence.
contact WITH mÁtica