Sometimes we’d like to fetch a small subset of wells from a large project based on attributes like reserve category, custom strings, etc.
Currently the get-project-company-wells and get-project-wells endpoints only allow filtering by API10/12/14, County, State, Well Name, Chosen ID, INPT ID, Created At, Updated At, Data Source, Current Operator.
Because the filtering options don’t allow us to use some of the other attributes we’d like to filter by (e.g., reserve category), it means we have to download headers for every well and then filter client-side instead. For larger projects (~tens of thousands of wells) this can take a very long time.
This is also complicated for endpoints like get-forecasts because we don’t have a good way to filter that list down, so instead we need to fetch all forecasts in the project (potentially very long response), join it with the project wells client-side, and then filter it down to the forecasts we actually need.
To reduce the amount of data transferred, it would be useful to be able to do the following:
For well header endpoints, it would be helpful to expose more filter options (most text attributes, but especially common options like reserve category/subcategory, custom strings, etc.).
For forecast endpoints (or other places where only IDs are available), it would be helpful to allow us to provide a set of IDs to include as part of the request. This would probably be too long to pass in as a URL parameter for a GET request, but ideally we could pass it in a request body for more complex options (e.g., maybe a special POST endpoint).
In some cases this would drastically reduce how much data would need to be transferred. People can work around it by creating new projects only containing the wells they want to download, but this can get a bit messy if the wells are project wells. It can also be a bit error prone when trying to sync the extra project regularly.
Just I thought I’d mention this here because we’ve seen a few companies using large projects for day-to-day work but only wanting to export small subsets. I’d guess lots of other people run into this too. Thanks!
At this time we do not have plans to expand the filtering available on the Well endpoints. This expanded filtering has the potential to impact performance. For many of our endpoints that return data scoped to the Well level we do offer a well filter. This filter should already be in place for the following forecast endpoints: get-forecast-monthly-volumes, get-forecast-daily-volumes, get-forecast-outputs, get-aries-forecast. Were there other endpoints where you might expect a well filter that has not already been provided?
Thanks @Jeff_Hopkins! I’m hoping the extra filtering would help avoid the overhead of processing/serializing all of the extra responses being sent back currently.
The workflow looks roughly something like this:
I want to get all forecasts for 1000 undeveloped wells (PUDs) in a 20,000 well project containing developed and undeveloped wells (PDPs + PUDs).
I don’t have a way to filter down to PUDs server-side, so I need to get all well headers (20,000) and filter client-side down to the 1000 PUDs I’m interested in.
Then I need to download all forecasts so I can filter those down to the 1000 PUDs I’m interested in. I do this by checking if the well ID is part of the filtered set I just created in the previous step.
I can’t use the well filter because it would mean making 1000 individual requests to the forecast endpoint in step (3), i.e., one request per well. Instead I download everything and filter it client-side in both steps (2) and (3). Please let me know if there are any other ways I might be able to accomplish this.
Are you using the get-forecast-outputs endpoint? That well filter is limited to one id value only right now, but I could bring it up with the team and see if we could expand that a bit. Several of the other endpoints actually allow filtering on more than one Well at a time. For example, the get-forecast-monthly-volumes endpoint will allow you to do something like this to filter on multiple Wells: `{{baseUrl}}/v1/projects/{{projectId}}/forecasts/{{forecastId}}/monthly-volumes?skip=0&take=25&well={{wellId1}}&well={{wellId2}}&well={{wellId3}}`. For that endpoint you can supply up to 100 Well id values per request.
You might also consider trying out our new POST forecast parameters export endpoint. With this endpoint you first execute a POST request with your filter parameters in the body. This returns a job id that you can then pass back into the corresponding GET method. You can then poll this GET method until it returns a successful status. This response will include an array of urls that point to parquet files containing the forecast parameter data. This endpoint does allow you to filter on multiple wells as well if you don’t want to retrieve data for all wells in the forecast. If you are dealing with large forecasts this has the potential to be much faster than paginating through the get-forecast-outputs endpoint.
Specifying multiple wells in the URL parameter could help to reduce data transfer. I think the complication with that approach is that splitting a filter of 1000 well IDs into 100 well IDs per request makes it hard to handle pagination.
Thanks for the new endpoint suggestion. I can try that out and see how it compares to my current approach.
We put in a change to update the well filter to allow up to 100 id values for the get-forecast-outputs endpoint. That should go out with the next release which is tentatively planned for early next week. As for the pagination, the limit on the number of ids to be passed in does complicate things, but is necessary for performance reasons on our side. I would do something like this to handle it:
Chunk the incoming well ids into groups of 100
Use a for loop to iterate through the chunks of well ids
For each chunk execute a do…while loop
Initialize a skip variable outside of the loop with a value of zero
In this loop
execute your GET request with a take of 1000 and with the skip variable from the previous step
yield the resulting records or add to an existing results array that will be returned later
Set the skip += the number of records returned
Your while condition would then check if the number of returned records is < 1000
If it is less than 1000 then you are on the last page of data and can exit the do…while
If it is equal to 1000 then there may be another page of data and the loop will execute again
@Jeff_Hopkins I’m testing the Parquet forecast parameter export but it seems to get stuck in the “running” state (I tried waiting 50 minutes) and tried to export different variations multiple times (e.g., a single well ID from a larger project).
In comparison the JSON outputs endpoint takes about 2 minutes for ~26,000 forecasts over 130 requests (26,000 chunked into 200 forecasts per request) for the same project/forecast combination.
Do these endpoints work differently, or are there any known limitations of the Parquet export that I might be running into? Thanks!
@josh Short answer: The Parquet should take 1-2 minutes to generate. Send us an email with the details of the params you are using, and we will track this down.
Hi, @josh thanks for the information you shared over email; it helped us identify an issue we were encountering when moving the parquet file to the bucket.
The fix has been deployed, and you should now be able to retrieve the parquet file after generating the export. As Danny mentioned, the process may take a few minutes depending on the wells involved.
Please let us know if you run into any further issues, and thank you for your patience.
@Angel_Torres I’ve been trying out the Parquet export for production forecast outputs and it seems to work well. The good news is that it works fine with thousands of wells too with about the same overhead. However I’ve noticed that the overhead of the Parquet export seems to be about 1-2 minutes even for extremely tiny exports (tens of wells).
Is it possible the overhead of 1-2 minutes could be reduced, especially for smaller projects?
Hi @josh Glad to hear the Parquet export is working well for you, even at larger scales.
The 1–2 minute delay is coming from the file generation process itself, and at the moment that overhead is fairly consistent regardless of project size. As we continue to improve and optimize the export workflow, we expect that time to come down, especially for smaller projects.
Let me know if you have any other questions or run into anything else.